🎓 Student Performance Prediction using Machine Learning

This project focuses on building, analyzing, and evaluating a Machine Learning model to predict student exam performance based on academic, behavioral, and socio-economic factors.

Unlike UI-centric applications, this project emphasizes the complete ML lifecycle, including data preprocessing, feature engineering, model comparison, diagnostics, and ethical considerations.

📌 Problem Statement

To develop a predictive model that estimates a student's final exam score using structured educational and behavioral data.

The objective is not only to generate predictions but also to understand:

Which features influence performance
How model assumptions affect results
The limitations and ethical implications of predictive modeling in education

🔁 Machine Learning Lifecycle

1️⃣ Data Cleaning

Checked for missing values
Verified data types
Removed inconsistent or duplicate records
Ensured unit consistency (e.g., study hours per week)
Validated numerical ranges (attendance %, scores, etc.)

2️⃣ Encoding

Since the dataset contains categorical variables:

Ordinal Encoding used for:
- Parental Involvement
- Motivation Level
- Access to Resources
- Family Income
- Teacher Quality
- Parental Education Level
Binary Encoding used for:
- Internet Access
- Extracurricular Activities
- Learning Disabilities
- Gender
One-Hot Encoding used for:
- School Type
- Peer Influence

Encoding was carefully designed to avoid introducing artificial order in nominal variables.

3️⃣ Outlier Analysis

Examined numerical features using statistical summaries
Identified extreme values in:
- Study Hours
- Previous Scores
- Attendance
Verified whether outliers were:
- Data entry errors (removed)
- Genuine rare cases (retained)

Outlier handling ensured model stability without distorting real-world variance.

4️⃣ Model Training & Comparison

Multiple models were evaluated:

Linear Regression
Random Forest Regressor
Gradient Boosting Regressor

Evaluation metrics used:

R² Score
Mean Absolute Error (MAE)
Mean Squared Error (MSE)

Model comparison allowed selecting the most interpretable and stable model for deployment.

Final selected model: Linear Regression (for interpretability and consistency).

5️⃣ Model Diagnostics

To ensure reliability:

Checked residual distribution
Analyzed prediction spread across low and high score ranges
Verified absence of extreme bias toward mean
Evaluated generalization performance on test data

Observed behavior:

Slight regression toward the mean (expected in linear models)
Stable prediction performance across moderate score ranges

6️⃣ Ethical Reasoning

Educational prediction systems must be handled responsibly.

This project acknowledges:

Predictions are probabilistic, not deterministic
Socio-economic factors must not reinforce bias
Model outputs should not label or restrict student potential
The system is for analytical and educational research purposes only

Ethical deployment requires:

Transparency
Bias awareness
Responsible interpretation

📊 Features Used

Academic Factors

Hours Studied
Attendance
Previous Scores
Sleep Hours
Tutoring Sessions
Physical Activity

Socio-economic & Behavioral Factors

Parental Involvement
Access to Resources
Family Income
Teacher Quality
Peer Influence
Internet Access
Extracurricular Activities
Learning Disabilities
Gender
Distance from Home

🛠️ Tech Stack

Language: Python

Libraries:

scikit-learn
pandas
numpy
streamlit
matplotlib

Model: Linear Regression
Deployment Platform: Streamlit Community Cloud

📂 Project Structure

LEVEL1_StudentPerformancePredictor/
│
├── app.py  #Streamlit UI for checking prediction
|
├── student_score_model.pkl  #Linear Regression Model for predicting the final score
│
├── requirements.txt
├── README.md

🚀 Deployment

The application is deployed using Streamlit Community Cloud, allowing seamless hosting of the Streamlit application directly from GitHub.

Deployment Process:

Push the complete project to a public GitHub repository.
Add requirements.txt for dependency management.
Connect the repository to Streamlit Community Cloud.
Deploy app.py as the entry point.

Live Application:
🔗 App URL: https://level1studentperformancepredictor-m2wfrbrej9jhxmuutaufe3.streamlit.app/

🚀 Future Enhancements

Inclusion of Social Media Usage as a behavioral factor
Hyperparameter tuning for ensemble models
Feature importance visualization
Bias detection analysis
Cross-validation-based stability testing

👩‍💻 Author

Apeksha Machine Learning and Python enthusiast

📜 License

This project is intended for educational and academic purposes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎓 Student Performance Prediction using Machine Learning

📌 Problem Statement

🔁 Machine Learning Lifecycle

1️⃣ Data Cleaning

2️⃣ Encoding

3️⃣ Outlier Analysis

4️⃣ Model Training & Comparison

5️⃣ Model Diagnostics

6️⃣ Ethical Reasoning

📊 Features Used

Academic Factors

Socio-economic & Behavioral Factors

🛠️ Tech Stack

📂 Project Structure

🚀 Deployment

🚀 Future Enhancements

👩‍💻 Author

📜 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
student_score_model.pkl		student_score_model.pkl

Folders and files

Latest commit

History

Repository files navigation

🎓 Student Performance Prediction using Machine Learning

📌 Problem Statement

🔁 Machine Learning Lifecycle

1️⃣ Data Cleaning

2️⃣ Encoding

3️⃣ Outlier Analysis

4️⃣ Model Training & Comparison

5️⃣ Model Diagnostics

6️⃣ Ethical Reasoning

📊 Features Used

Academic Factors

Socio-economic & Behavioral Factors

🛠️ Tech Stack

📂 Project Structure

🚀 Deployment

🚀 Future Enhancements

👩‍💻 Author

📜 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages