This repository contains a data science project focused on the prediction of chronic kidney disease (CKD) using clinical and laboratory data. It covers the full workflow from preprocessing and exploratory data analysis to imputation, statistical modeling, machine learning, causal inference, and evaluation.
The project is organized as a notebook-based pipeline and compares multiple predictive approaches, including decision trees, random forests, gradient boosting, logistic regression, k-nearest neighbors, support vector machines, and neural networks.
Chronic kidney disease is a major global health issue associated with increased morbidity and mortality. In the project notes, CKD is described as a progressive loss of kidney function, with diagnosis commonly relying on indicators such as creatinine-related measurements and albumin in urine. The notes also highlight several relevant clinical risk factors and biomarkers, including hypertension, diabetes mellitus, blood urea, serum creatinine, hemoglobin, and specific gravity.
This repository investigates how well CKD can be predicted from patient features and which variables are most informative for classification.
datascience-ckd/
├── assets/ # Images and figures used in the project
├── data/ # Raw data
├── processed/ # Processed datasets
├── plots/ # Exported visualizations
├── results/ # Model outputs and final results
├── util/ # Helper functions and utilities
├── 1_preprocessing.ipynb
├── 2_eda.ipynb
├── 3_imputation.ipynb
├── 4_statistical_modeling.ipynb
├── 5_learning.ipynb
├── 5.1_decision_tree.ipynb
├── 5.2_random_forests.ipynb
├── 5.2a_random_forests_without_diabetes.ipynb
├── 5.3_gradient_boosting.ipynb
├── 5.3a_gradient_boosting_without_diabetes.ipynb
├── 5.4_logistic_regression.ipynb
├── 5.4a_logistic_regression_without_diabetes.ipynb
├── 5.5_knn.ipynb
├── 5.6_svm.ipynb
├── 5.7_neural_networks.ipynb
├── 5.7a_neural_networks_without_diabetes.ipynb
├── 5.8_causal_inference.ipynb
├── 6_evaluation.ipynb
├── 6_evaluation_luisa.ipynb
├── 6_evaluation_without_diabetes.ipynb
├── metrics_dtree.csv
├── metrics_knn.csv
├── metrics_logreg.csv
├── metrics_rf.csv
├── metrics_svm.csv
├── data.json
└── research.md