Skip to content

ApekshaMundey/Level1_StudentPerformancePredictor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🎓 Student Performance Prediction using Machine Learning

This project focuses on building, analyzing, and evaluating a Machine Learning model to predict student exam performance based on academic, behavioral, and socio-economic factors.

Unlike UI-centric applications, this project emphasizes the complete ML lifecycle, including data preprocessing, feature engineering, model comparison, diagnostics, and ethical considerations.


📌 Problem Statement

To develop a predictive model that estimates a student's final exam score using structured educational and behavioral data.

The objective is not only to generate predictions but also to understand:

  • Which features influence performance
  • How model assumptions affect results
  • The limitations and ethical implications of predictive modeling in education

🔁 Machine Learning Lifecycle

1️⃣ Data Cleaning

  • Checked for missing values
  • Verified data types
  • Removed inconsistent or duplicate records
  • Ensured unit consistency (e.g., study hours per week)
  • Validated numerical ranges (attendance %, scores, etc.)

2️⃣ Encoding

Since the dataset contains categorical variables:

  • Ordinal Encoding used for:

    • Parental Involvement
    • Motivation Level
    • Access to Resources
    • Family Income
    • Teacher Quality
    • Parental Education Level
  • Binary Encoding used for:

    • Internet Access
    • Extracurricular Activities
    • Learning Disabilities
    • Gender
  • One-Hot Encoding used for:

    • School Type
    • Peer Influence

Encoding was carefully designed to avoid introducing artificial order in nominal variables.


3️⃣ Outlier Analysis

  • Examined numerical features using statistical summaries
  • Identified extreme values in:
    • Study Hours
    • Previous Scores
    • Attendance
  • Verified whether outliers were:
    • Data entry errors (removed)
    • Genuine rare cases (retained)

Outlier handling ensured model stability without distorting real-world variance.


4️⃣ Model Training & Comparison

Multiple models were evaluated:

  • Linear Regression
  • Random Forest Regressor
  • Gradient Boosting Regressor

Evaluation metrics used:

  • R² Score
  • Mean Absolute Error (MAE)
  • Mean Squared Error (MSE)

Model comparison allowed selecting the most interpretable and stable model for deployment.

Final selected model: Linear Regression (for interpretability and consistency).


5️⃣ Model Diagnostics

To ensure reliability:

  • Checked residual distribution
  • Analyzed prediction spread across low and high score ranges
  • Verified absence of extreme bias toward mean
  • Evaluated generalization performance on test data

Observed behavior:

  • Slight regression toward the mean (expected in linear models)
  • Stable prediction performance across moderate score ranges

6️⃣ Ethical Reasoning

Educational prediction systems must be handled responsibly.

This project acknowledges:

  • Predictions are probabilistic, not deterministic
  • Socio-economic factors must not reinforce bias
  • Model outputs should not label or restrict student potential
  • The system is for analytical and educational research purposes only

Ethical deployment requires:

  • Transparency
  • Bias awareness
  • Responsible interpretation

📊 Features Used

Academic Factors

  • Hours Studied
  • Attendance
  • Previous Scores
  • Sleep Hours
  • Tutoring Sessions
  • Physical Activity

Socio-economic & Behavioral Factors

  • Parental Involvement
  • Access to Resources
  • Family Income
  • Teacher Quality
  • Peer Influence
  • Internet Access
  • Extracurricular Activities
  • Learning Disabilities
  • Gender
  • Distance from Home

🛠️ Tech Stack

Language: Python

Libraries:

  • scikit-learn
  • pandas
  • numpy
  • streamlit
  • matplotlib

Model: Linear Regression
Deployment Platform: Streamlit Community Cloud


📂 Project Structure

LEVEL1_StudentPerformancePredictor/
│
├── app.py  #Streamlit UI for checking prediction
|
├── student_score_model.pkl  #Linear Regression Model for predicting the final score
│
├── requirements.txt
├── README.md

🚀 Deployment

The application is deployed using Streamlit Community Cloud, allowing seamless hosting of the Streamlit application directly from GitHub.

Deployment Process:

  1. Push the complete project to a public GitHub repository.
  2. Add requirements.txt for dependency management.
  3. Connect the repository to Streamlit Community Cloud.
  4. Deploy app.py as the entry point.

Live Application:
🔗 App URL: https://level1studentperformancepredictor-m2wfrbrej9jhxmuutaufe3.streamlit.app/


🚀 Future Enhancements

  • Inclusion of Social Media Usage as a behavioral factor
  • Hyperparameter tuning for ensemble models
  • Feature importance visualization
  • Bias detection analysis
  • Cross-validation-based stability testing

👩‍💻 Author

Apeksha Machine Learning and Python enthusiast


📜 License

This project is intended for educational and academic purposes.

About

This is the last project in Level 1 of roadmap, and it was not classification this time, but Linear Regression. This project enabled me to understand the residuals, and check if there were any outliers by the graph. I even learnt the different types of encoding like one-hot, ordinal encoding. I even compared with other models like random forest

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages