A machine learning classification project focused on predicting whether a breast tumor is malignant or benign using the Breast Cancer Wisconsin dataset from scikit-learn.
The objective of this project is to build and evaluate multiple machine learning classification models while understanding:
- Classification techniques
- Evaluation metrics
- ROC Curve & AUC analysis
- Handling imbalanced data
- Model comparison
- Feature importance interpretation
- Dataset: Breast Cancer Wisconsin Dataset
- Source: scikit-learn built-in dataset
- Total Samples: 569
- Total Features: 30
| Value | Meaning |
|---|---|
| 0 | Malignant |
| 1 | Benign |
- Python
- Pandas
- NumPy
- Matplotlib
- Seaborn
- Scikit-learn
The following classification models were implemented:
| Model | Purpose |
|---|---|
| Logistic Regression | Baseline classification |
| Decision Tree | Rule-based classification |
| Random Forest | Ensemble learning |
The models were evaluated using:
- Accuracy
- Precision
- Recall
- F1-score
- Confusion Matrix
- ROC Curve
- AUC Score
- Data Loading
- Exploratory Data Analysis (EDA)
- Missing Value Analysis
- Feature Correlation Heatmap
- Train-Test Split
- Feature Scaling
- Logistic Regression Modeling
- ROC-AUC Evaluation
- Handling Imbalanced Data
- Decision Tree Classification
- Random Forest Classification
- Model Comparison
- Feature Importance Analysis
- Final Insights & Conclusion
| Model | Accuracy |
|---|---|
| Logistic Regression | 98.24% |
| Decision Tree | 91.22% |
| Random Forest | 95.61% |
Logistic Regression achieved the strongest overall balance between:
- Accuracy
- Stability
- Interpretability
- ROC-AUC performance
- Recall is highly important in medical diagnosis because false negatives may lead to undetected cancer cases.
- ROC-AUC analysis provides better evaluation than accuracy alone.
- Random Forest improved predictive capability using ensemble learning.
- Feature importance analysis identified influential medical indicators.
AI_ML_Task4_Classification_Project/
│
├── AI_ML_Task4_Classification.ipynb
├── AI_ML_Task4_Classification.pdf
└── README.md
This project demonstrates how machine learning classification techniques can be applied to real-world medical diagnosis problems using proper evaluation metrics and model comparison techniques.
Sahil Bhatti