This project predicts individual insurance charges using a Random Forest Regressor model. The model takes features like age, BMI, number of children, smoking status, sex, and region to estimate medical insurance costs.
- Numerical: age, bmi, children
- Categorical: sex, smoker, region
The dataset used is the popular Insurance Dataset from Kaggle.
- Numpy
- Pandas
- scikit-learn
- joblib
- Ensure insurance.csv is in the project folder.
- Run the main script:
- If the model does not exist, it will train the model and save it as model.pkl and pipeline.pkl.
- If the model already exists, it will load the model and run inference on test.csv.
- Output predictions will be saved to output.csv with columns:
- predicted_charges → model predictions
- actual_charges → actual charges (from test set)