This repository contains Jupyter notebooks and datasets for predicting village infrastructure using machine learning models.
The goal of this project is to analyze village infrastructure data and build predictive models using logistic regression and random forest classifiers.
dataset_creation.ipynb- Processes raw data and performs feature engineering.lr_trained_model.ipynb- Implements and evaluates a Logistic Regression model.rf_trained_model.ipynb- Implements and evaluates a Random Forest model.main.ipynb- Integrates dataset creation and model execution.Village_infras.csv- Contains village infrastructure data with 13 columns and 20000 rows.
To run the notebooks, install the following dependencies:
pip install pandas scikit-learn numpy matplotlib seaborn- Run
dataset_creation.ipynbto preprocess the data. - Execute
lr_trained_model.ipynborrf_trained_model.ipynbto train and evaluate models. - Use
main.ipynbto integrate the entire workflow.
- The logistic regression model provides a baseline performance.
- The random forest classifier improves predictive accuracy.