Skip to content

sosush/NetSentinel

Repository files navigation

🛡️ NetSentinel

A Production-Grade MLOps Pipeline for Network Intrusion Detection

XGBoost multi-class classifier · MLflow experiment tracking · DVC data versioning · FastAPI serving · Evidently AI monitoring · Docker orchestration

Python License: MIT Dataset Docker MLflow DVC FastAPI


Data → Train → Serve → Monitor → Retrain End-to-end MLOps pipeline with environment parity, type-safe APIs, and automated drift detection.


📌 Table of Contents


🔍 Overview

NetSentinel is a production-grade MLOps pipeline for network intrusion detection. It trains an XGBoost multi-class classifier on the NSL-KDD dataset, tracks experiments via MLflow, versions data with DVC, serves predictions through a FastAPI REST API, and visualizes data drift using Evidently AI — all orchestrated in Docker Compose.

The project is intentionally designed to reflect real SDE and MLOps practices: reproducible pipelines, type-safe APIs, containerized environments, and monitoring-driven retraining decisions.

Classification targets:

Label Description
Normal Legitimate network traffic
DoS Denial of Service attacks
Probe Surveillance and port scanning
R2L Remote-to-Local unauthorized access
U2R User-to-Root privilege escalation

NSL-KDD's test set contains novel attack subtypes not present in training — so the model is evaluated on Macro F1-Score and Recall for minority classes (U2R, R2L), not just accuracy.

[ KDDTrain+.txt / KDDTest+.txt ]
          │
          ▼  (DVC versioned)
  [ Preprocessing Layer ]
    Feature engineering
          │
          ▼  (MLflow tracked)
  [ Training Layer ]
    XGBoost classifier
    Params + Metrics + Artifacts logged
          │
          ▼  (Dynamic model load via glob)
  [ FastAPI Serving Layer ]
    POST /predict  ·  Pydantic validated
          │
          ▼  (Docker internal network)
  [ Streamlit Dashboard ]
    Predictions + Evidently AI drift report

✨ Features

Feature Description
🧠 XGBoost Multi-Class Classifier Trained on NSL-KDD with Macro F1 optimization; handles class imbalance across Normal, DoS, Probe, R2L, U2R
📦 DVC Data Versioning Dataset and artifacts tracked with DVC — full reproducibility across environments
📊 MLflow Experiment Tracking Every run logs hyperparameters, metrics, and model artifacts to the MLflow registry
FastAPI REST API Type-safe prediction endpoint with Pydantic schema validation and NumPy-safe JSON serialization
🔍 Evidently AI Drift Monitoring Detects data drift between train and test distributions; triggers retraining decisions automatically
🖥️ Streamlit Dashboard Renders live predictions and Evidently HTML drift reports via st.components.html
🐳 Docker Compose Orchestration Single command spins up API + Dashboard with full environment parity across Mac, Linux, and Cloud
🔒 Type Safety Throughout Pydantic validates every incoming network feature payload before it reaches the model

🏗️ System Architecture

NetSentinel is organized into five clearly separated layers:

┌──────────────────┐
│   DATA LAYER     │  KDDTrain+.txt · KDDTest+.txt
│   (DVC)          │  Versioned, reproducible, shareable
└────────┬─────────┘
         │
         ▼
┌──────────────────┐
│  TRAINING LAYER  │  src/models/train.py
│  (MLflow)        │  XGBoost · Params · Metrics · Artifacts
└────────┬─────────┘
         │  model.pkl (loaded dynamically via glob)
         ▼
┌──────────────────┐
│  SERVING LAYER   │  api/main.py
│  (FastAPI)       │  POST /predict · Pydantic · NumPy-safe JSON
└────────┬─────────┘
         │  Docker internal network
         ▼
┌──────────────────┐     ┌──────────────────────────┐
│  MONITORING      │────▶│  UI LAYER                │
│  (Evidently AI)  │     │  (Streamlit)             │
│  Drift reports   │     │  Dashboard + drift report│
└──────────────────┘     └──────────────────────────┘

Example prediction request / response:

// POST /predict
{
  "duration": 0,
  "protocol_type": "tcp",
  "service": "http",
  "flag": "SF",
  "src_bytes": 215,
  "dst_bytes": 45076
}

// Response
{
  "prediction": "Normal",
  "confidence": 0.97
}

🧰 Tech Stack

Layer Technology
ML / Training XGBoost, LightGBM, Scikit-Learn, Pandas, NumPy < 2.0
MLOps MLflow (experiment tracking + model registry), DVC (data versioning)
API FastAPI, Uvicorn, Pydantic
Monitoring Evidently AI ≥ 0.7.0 (data drift + model monitoring)
Frontend Streamlit (st.components.html for Evidently reports)
DevOps Docker, Docker Compose
Utilities Loguru, Pytest, HTTPx, Joblib
Dataset NSL-KDD (KDDTrain+.txt · KDDTest+.txt)

📁 Project Structure

NetSentinel/
│
├── api/
│   ├── main.py              # FastAPI app + dynamic model loader (glob)
│   └── schemas.py           # Pydantic request/response models
│
├── dashboard/
│   ├── app.py               # Streamlit dashboard
│   └── Dockerfile           # UI container
│
├── src/
│   ├── data/
│   │   └── preprocessing.py # Feature engineering for NSL-KDD
│   ├── models/
│   │   └── train.py         # XGBoost training + MLflow logging
│   └── monitoring/
│       └── drift.py         # Evidently AI drift report generation
│
├── data/                    # DVC-tracked (KDDTrain+.txt, KDDTest+.txt)
├── mlruns/                  # MLflow experiment store
├── docker-compose.yml       # Service orchestrator (API + Dashboard)
├── Dockerfile               # API container
├── requirements.txt
└── README.md

🚀 Quick Start

This is a 3-command setup. Docker handles all dependencies — no virtual environments, no manual installs.

1 · Clone the repository

git clone https://github.com/sosush/NetSentinel.git
cd NetSentinel

2 · Pull the dataset via DVC

dvc pull

Or manually place KDDTrain+.txt and KDDTest+.txt in the data/ directory.

3 · Build and run the full stack

docker-compose up --build
Service URL
Streamlit Dashboard http://localhost:8501
FastAPI Swagger Docs http://localhost:8000/docs
MLflow UI http://localhost:5000

That's it. The API, dashboard, and monitoring stack all come up together.


If you want to retrain the model manually:

# Inside the container or local venv
python src/models/train.py

MLflow logs the run automatically. The API reloads the latest model artifact on the next request via glob.


🗄️ Dataset (NSL-KDD)

NSL-KDD is the standard benchmark for network intrusion detection research, addressing key shortcomings of the original KDD Cup 1999 dataset.

Files:

File Purpose
KDDTrain+.txt Training set — 41 features + attack label
KDDTest+.txt Test set — contains novel attack subtypes not in training

Why Macro F1 over Accuracy? The test set is intentionally skewed toward rare attack classes (U2R, R2L). A model that predicts "Normal" for everything achieves high accuracy but fails in practice. NetSentinel optimizes for Macro F1-Score and per-class Recall to ensure minority attack types are actually detected.

Download the dataset from the GitHub Releases page and place files under data/.


🌐 API Reference

POST /predict

Accepts a JSON payload of network flow features, returns a predicted class and confidence score.

curl -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d '{
    "duration": 0,
    "protocol_type": "tcp",
    "service": "http",
    "flag": "SF",
    "src_bytes": 215,
    "dst_bytes": 45076
  }'

All fields are validated by Pydantic before reaching the model. NumPy types (int64, float32) are cast to standard Python types before serialization — avoiding the TypeError: Object of type int64 is not JSON serializable error that plagues most ML APIs.

GET /health

curl http://localhost:8000/health
# {"status": "ok", "model_loaded": true}

🔧 Troubleshooting

These are real issues encountered during development, not hypotheticals.

Issue Root Cause Fix
TypeError: Object of type int64 is not JSON serializable NumPy types don't serialize to JSON natively Cast outputs to int(x) or str(x) in api/main.py before returning the response
ValueError: numpy.dtype size changed NumPy 2.0 broke binary compatibility with compiled extensions (XGBoost, Evidently) Pin numpy<2.0.0 in requirements.txt or upgrade to evidently>=0.7.0
MLflow loads wrong model / path not found Hardcoded absolute host paths baked into MLflow artifact URIs don't resolve inside Docker containers Use glob("mlruns/**/model.pkl", recursive=True) to dynamically resolve the latest artifact at runtime
Streamlit can't reach API Services on different Docker networks Ensure both services are under the same network in docker-compose.yml; use the service name (e.g. http://api:8000) not localhost
Drift report not rendering Evidently HTML uses inline JS that Streamlit sandboxes Use st.components.v1.html(report_html, height=800) instead of st.markdown

🤝 Contributing

Pull requests are welcome.

# 1. Fork the repository
# 2. Create a feature branch
git checkout -b feature/my-feature

# 3. Commit your changes
git commit -m "feat: describe your change"

# 4. Push and open a Pull Request
git push origin feature/my-feature

📜 License

This project is released under the MIT License — free to modify and distribute with attribution.

See LICENSE for full details.


Made by sosush

About

Real-time ML-powered Network Intrusion Detection System

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors