🚀 Production-style ML monitoring system with Streamlit dashboard and Evidently AI integration.
This project simulates a production ML monitoring system that detects data drift, prediction drift, and model performance degradation on new incoming datasets.
The system compares new data distributions against baseline training statistics and flags potential risks to model reliability.
- Detects Feature Drift using baseline statistics
- Detects Prediction Drift using prediction distribution comparison
- Detects Target Drift & Model Performance Drift using Evidently AI
- Interactive Streamlit Dashboard
- Supports real-time CSV upload
- Shows visual HTML reports for detailed drift analysis
- Provides Model Health Status (STABLE / MONITOR / HIGH RISK)
- Python
- Scikit-learn
- Pandas
- NumPy
- Streamlit
- Joblib
- Evidently AI
User Upload CSV
↓
Load Saved Pipeline
↓
Generate Predictions
↓
Drift Detection
├─ Baseline Math
└─ Evidently AI
↓
Streamlit Dashboard
- Custom Feature Drift Detection (mean shift vs baseline standard deviation)
- Custom Prediction Drift Detection (positive-rate distribution change)
- Evidently AI based:
- Data Drift Detection
- Target Drift Detection
- Classification Performance Drift
- ML Pipeline with ColumnTransformer
- Baseline statistics & prediction distribution persistence
- Model health status classification (STABLE / MONITOR / HIGH RISK)
-
Train the ML model and save:
- Model pipeline
- Baseline feature statistics
- Baseline prediction distribution
-
Upload a new dataset via Streamlit
-
System computes: Custom Monitoring:
- Feature-level drift
- Prediction drift
Evidently AI Monitoring:
- Dataset Drift Report
- Target Drift Report
- Classification Performance Report
-
Model health status is reported as:
- STABLE
- MONITOR
- HIGH RISK
The system saves the following artifacts after training:
- model_pipeline.pkl → trained ML pipeline
- baseline_stats.pkl → baseline feature statistics
- baseline_positive_rate.pkl → baseline prediction distribution
These are used for real-time drift comparison.
ML-DATA-DRIFT-MONITORING-PROJECT/
│
├── app.py
├── requirements.txt
├── readme.md
├── .gitignore
│
├── artifacts/
│ ├── model_pipeline.pkl
│ ├── baseline_stats.pkl
│ └── baseline_positive_rate.pkl
│
├── src/
│ ├── __init__.py
│ ├── baseline_statistics.py
│ ├── drift_utils.py
│ ├── model_utils.py
│ └── evidently_reports.py
│
├── reports/
│ ├── data_drift.html
│ ├── target_drift.html
│ └── classification_drift.html
│
├── data/
│ └── Telco_Customer_churn.csv
│
├── notebooks/
│ └── model_training.ipynb
pip install -r requirements.txt
streamlit run app.py- Python=3.11
- Streamlit
- Scikit-learn
- Pandas
- NumPy
- Evidently AI
- Joblib

