Skip to content

27Naman2004/Student_DropOut_Prediction_Sys

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🎓 Student Dropout Prediction System

The Student Dropout Prediction System is a machine learning–based predictive analytics project developed to identify students who are at risk of dropping out of higher education institutions. The project applies supervised learning classification techniques on academic and demographic data and presents the results through an interactive Streamlit web interface.


📌 Introduction

Student dropout is a critical issue faced by educational institutions worldwide. Early identification of students who are likely to drop out enables universities and colleges to provide timely academic and administrative support. This project uses historical student data and machine learning algorithms to predict dropout behavior accurately.


🎯 Project Objectives

The main objectives of this project are to understand student data patterns, preprocess and clean the dataset, apply multiple machine learning classification algorithms, evaluate their performance using standard metrics, and deploy the best-performing models using an easy-to-use web interface.


🧠 Machine Learning Methodology

This project follows a supervised learning approach where labeled data is used to train classification models. The prediction task is binary classification, where students are categorized as Dropout or Non-Dropout.

The following classification algorithms are implemented and compared:

  • Gaussian Naive Bayes
  • Logistic Regression
  • Random Forest Classifier
  • Support Vector Machine (SVM)
  • Perceptron
  • K-Nearest Neighbors (KNN)

🗂 Dataset Overview

The dataset contains academic performance, financial status, and demographic information of students. The original target variable includes Graduate, Dropout, and Enrolled categories. For better prediction relevance, only Graduate and Dropout records are used, and Enrolled students are removed. The final target variable is converted into a binary format.


🧹 Data Preprocessing

Several preprocessing steps are applied before training the models. These include checking for missing values, label encoding categorical variables, feature scaling using StandardScaler, and splitting the dataset into training and testing sets using an 80:20 ratio.


📊 Exploratory Data Analysis and Visualization

Data visualization is used extensively to understand data distribution and patterns. The project includes visualizations such as target variable distribution, gender distribution, feature-wise distribution plots, Pearson correlation heatmap, confusion matrix visualization, and KNN accuracy analysis based on different values of K.


✅ Model Evaluation

Each machine learning model is evaluated using standard classification performance metrics including accuracy, precision, recall, F1 score, confusion matrix, and classification report. These metrics allow effective comparison of different algorithms and help determine the best-performing model.


🖥️ Streamlit User Interface

A Streamlit-based web application is developed to make the project interactive and user-friendly. The interface allows users to view the dataset, visualize important attributes, select machine learning models, adjust hyperparameters, and observe performance metrics and confusion matrices in real time.


⚙️ How to Run the Project

To run the project, install the required Python libraries, ensure the dataset is available in the specified directory, and execute the Streamlit application using the Streamlit run command. The application opens automatically in a web browser and can be interacted with easily.


📦 Technologies Used

The project is implemented using Python and leverages popular data science and machine learning libraries including NumPy, Pandas, Matplotlib, Seaborn, Scikit-learn, and Streamlit.


🏫 Academic Relevance

This project aligns well with academic syllabi covering Predictive Analytics, Machine Learning, Supervised Learning, Model Evaluation Techniques, Data Visualization, and ML Deployment. It is suitable for mini-projects, final-year projects, lab submissions, and viva demonstrations.


🚀 Future Enhancements

Future improvements may include adding individual student prediction forms, integrating ROC and Precision-Recall curves, performing advanced feature selection, saving trained models for reuse, and deploying the application on cloud platforms.


📄 Declaration

This project is developed purely for academic and educational purposes to demonstrate the application of machine learning techniques in real-world scenarios.


✅ Conclusion

The Student Dropout Prediction System demonstrates a complete machine learning pipeline starting from data preprocessing and visualization to model training, evaluation, and deployment. It highlights the importance of predictive analytics in improving decision-making within the education sector.

Releases

No releases published

Packages

 
 
 

Contributors