Skip to content

Latest commit

 

History

History
139 lines (98 loc) · 2.77 KB

File metadata and controls

139 lines (98 loc) · 2.77 KB

🩺 Breast Cancer Classification using Machine Learning

A machine learning classification project focused on predicting whether a breast tumor is malignant or benign using the Breast Cancer Wisconsin dataset from scikit-learn.


📌 Project Objective

The objective of this project is to build and evaluate multiple machine learning classification models while understanding:

  • Classification techniques
  • Evaluation metrics
  • ROC Curve & AUC analysis
  • Handling imbalanced data
  • Model comparison
  • Feature importance interpretation

📂 Dataset Information

  • Dataset: Breast Cancer Wisconsin Dataset
  • Source: scikit-learn built-in dataset
  • Total Samples: 569
  • Total Features: 30

🎯 Target Classes

Value Meaning
0 Malignant
1 Benign

⚙️ Technologies Used

  • Python
  • Pandas
  • NumPy
  • Matplotlib
  • Seaborn
  • Scikit-learn

🤖 Machine Learning Models

The following classification models were implemented:

Model Purpose
Logistic Regression Baseline classification
Decision Tree Rule-based classification
Random Forest Ensemble learning

📊 Evaluation Metrics

The models were evaluated using:

  • Accuracy
  • Precision
  • Recall
  • F1-score
  • Confusion Matrix
  • ROC Curve
  • AUC Score

📈 Project Workflow

  1. Data Loading
  2. Exploratory Data Analysis (EDA)
  3. Missing Value Analysis
  4. Feature Correlation Heatmap
  5. Train-Test Split
  6. Feature Scaling
  7. Logistic Regression Modeling
  8. ROC-AUC Evaluation
  9. Handling Imbalanced Data
  10. Decision Tree Classification
  11. Random Forest Classification
  12. Model Comparison
  13. Feature Importance Analysis
  14. Final Insights & Conclusion

🏆 Results Summary

Model Accuracy
Logistic Regression 98.24%
Decision Tree 91.22%
Random Forest 95.61%

✅ Best Performing Model

Logistic Regression achieved the strongest overall balance between:

  • Accuracy
  • Stability
  • Interpretability
  • ROC-AUC performance

🔍 Key Insights

  • Recall is highly important in medical diagnosis because false negatives may lead to undetected cancer cases.
  • ROC-AUC analysis provides better evaluation than accuracy alone.
  • Random Forest improved predictive capability using ensemble learning.
  • Feature importance analysis identified influential medical indicators.

📁 Repository Structure

AI_ML_Task4_Classification_Project/
│
├── AI_ML_Task4_Classification.ipynb
├── AI_ML_Task4_Classification.pdf
└── README.md

🚀 Conclusion

This project demonstrates how machine learning classification techniques can be applied to real-world medical diagnosis problems using proper evaluation metrics and model comparison techniques.


👨‍💻 Author

Sahil Bhatti