This project focuses on building, analyzing, and evaluating a Machine Learning model to predict student exam performance based on academic, behavioral, and socio-economic factors.
Unlike UI-centric applications, this project emphasizes the complete ML lifecycle, including data preprocessing, feature engineering, model comparison, diagnostics, and ethical considerations.
To develop a predictive model that estimates a student's final exam score using structured educational and behavioral data.
The objective is not only to generate predictions but also to understand:
- Which features influence performance
- How model assumptions affect results
- The limitations and ethical implications of predictive modeling in education
- Checked for missing values
- Verified data types
- Removed inconsistent or duplicate records
- Ensured unit consistency (e.g., study hours per week)
- Validated numerical ranges (attendance %, scores, etc.)
Since the dataset contains categorical variables:
-
Ordinal Encoding used for:
- Parental Involvement
- Motivation Level
- Access to Resources
- Family Income
- Teacher Quality
- Parental Education Level
-
Binary Encoding used for:
- Internet Access
- Extracurricular Activities
- Learning Disabilities
- Gender
-
One-Hot Encoding used for:
- School Type
- Peer Influence
Encoding was carefully designed to avoid introducing artificial order in nominal variables.
- Examined numerical features using statistical summaries
- Identified extreme values in:
- Study Hours
- Previous Scores
- Attendance
- Verified whether outliers were:
- Data entry errors (removed)
- Genuine rare cases (retained)
Outlier handling ensured model stability without distorting real-world variance.
Multiple models were evaluated:
- Linear Regression
- Random Forest Regressor
- Gradient Boosting Regressor
Evaluation metrics used:
- R² Score
- Mean Absolute Error (MAE)
- Mean Squared Error (MSE)
Model comparison allowed selecting the most interpretable and stable model for deployment.
Final selected model: Linear Regression (for interpretability and consistency).
To ensure reliability:
- Checked residual distribution
- Analyzed prediction spread across low and high score ranges
- Verified absence of extreme bias toward mean
- Evaluated generalization performance on test data
Observed behavior:
- Slight regression toward the mean (expected in linear models)
- Stable prediction performance across moderate score ranges
Educational prediction systems must be handled responsibly.
This project acknowledges:
- Predictions are probabilistic, not deterministic
- Socio-economic factors must not reinforce bias
- Model outputs should not label or restrict student potential
- The system is for analytical and educational research purposes only
Ethical deployment requires:
- Transparency
- Bias awareness
- Responsible interpretation
- Hours Studied
- Attendance
- Previous Scores
- Sleep Hours
- Tutoring Sessions
- Physical Activity
- Parental Involvement
- Access to Resources
- Family Income
- Teacher Quality
- Peer Influence
- Internet Access
- Extracurricular Activities
- Learning Disabilities
- Gender
- Distance from Home
Language: Python
Libraries:
- scikit-learn
- pandas
- numpy
- streamlit
- matplotlib
Model: Linear Regression
Deployment Platform: Streamlit Community Cloud
LEVEL1_StudentPerformancePredictor/
│
├── app.py #Streamlit UI for checking prediction
|
├── student_score_model.pkl #Linear Regression Model for predicting the final score
│
├── requirements.txt
├── README.md
The application is deployed using Streamlit Community Cloud, allowing seamless hosting of the Streamlit application directly from GitHub.
Deployment Process:
- Push the complete project to a public GitHub repository.
- Add
requirements.txtfor dependency management. - Connect the repository to Streamlit Community Cloud.
- Deploy
app.pyas the entry point.
Live Application:
🔗 App URL: https://level1studentperformancepredictor-m2wfrbrej9jhxmuutaufe3.streamlit.app/
- Inclusion of Social Media Usage as a behavioral factor
- Hyperparameter tuning for ensemble models
- Feature importance visualization
- Bias detection analysis
- Cross-validation-based stability testing
Apeksha Machine Learning and Python enthusiast
This project is intended for educational and academic purposes.