GitHub - IsaacSantous/Diabetes-Prediction-Leveraging-Python-and-Machine-Learning: Diabetes is a life threaten disease it early detection help in proper management and improve outcomes. This project seek to build a robust prediction model. Eight models were performed and the XGB Classifier outperforms the other models.

Project Overview

The Diabetes Prediction using Machine Learning project aims to develop a robust model capable of identifying patients who are positive for diabetes. By leveraging machine learning techniques, this project enhances the accurate prediction of diabetes prevalence, allowing for timely and targeted preventive measures.

Project Objective

The primary objective of this project is to build and train a robust machine learning model that can accurately predict the presence or absence of diabetes among patients.

Data Sources

The dataset used in this project was provided by 10Alytics, a company I have worked with for the past 6 months. The dataset contains a collection of features extracted from patients' medical history, including smoking history, BMI, blood glucose level etc.

Data Preprocessing

Before feeding the data into the machine learning model, extensive data preprocessing was performed. This included handling missing or null values, checking for duplicates, data normalisation and standardisation. Additionally, feature engineering techniques were applied to extract relevant information from the raw data.

Machine Learning Model

The Stark Hospital diabetes prediction machine learning model was built using a supervised machine learning approach. Training and test data was split into 80:20. Several classification algorithms were experimented with, including but not limited to:

Logistic Regression
Random Forest
K-Nearest Neighbors
Support Vector Machine
XGB Classifier
Decision Tree etc. After extensive experimentation and hyperparameter turning, the final machine learning model was selected based on its performance and generalisation capabilities. Furthermore, these models are compared to determine the most effective model in this regard by evaluating their accuracy of prediction, alongside other performance metrics such as precision, recall and ROC score.

Evaluation Metrics

To assess the performance of the machine learning model, the following evaluation metrics were used:

Precision: The proportion of correctly predicted positive (diabetes) patients among all patients that are classified diabetic.
Recall: The proportion of all actual positives that were classified correctly as positives.
Accuracy: The overall proportion of correctly predicted patients (both positive and negative).
ROC: The trade-off between that are true positive prediction and false positive prediction.

Key Insights

After cross validation, the model with the highest accuracy will be deployed. Accuracy is the most relevant matrix for evaluation in this project due to the significant target imbalance.
The confusion matrix for two models (Random Forest and Logistic Regression) displays the error value for each model in terms of False Positives (patients predicted to have diabetes while in actuality they are not) and False Negatives (patients predicted not having diabetes but in actuality they have it)

Conclusion

The primary objective of this project is to apply different machine learning algorithms to predict the presence and absence of diabetes. Eight machine learning models are compared to determine the most effective model in this regard by evaluating their accuracy of prediction, alongside other performance metrics such as precision, recall and ROC score. Of the models investigated, the XGB Classifier significantly outperformed the others, achieving an accuracy of 97.16%.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
STARK HEALTH CLINIC -DIABETES PREDICTION PROJECT.pdf		STARK HEALTH CLINIC -DIABETES PREDICTION PROJECT.pdf
Stark Hospital Diabetes Prediction Model.ipynb		Stark Hospital Diabetes Prediction Model.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Table of Contents

Project Overview

Project Objective

Data Sources

Data Preprocessing

Machine Learning Model

Evaluation Metrics

Key Insights

Conclusion

About

Uh oh!

Releases

Packages

Languages

IsaacSantous/Diabetes-Prediction-Leveraging-Python-and-Machine-Learning

Folders and files

Latest commit

History

Repository files navigation

Table of Contents

Project Overview

Project Objective

Data Sources

Data Preprocessing

Machine Learning Model

Evaluation Metrics

Key Insights

Conclusion

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages