USAS (UIDAI Smart Allocation System) is a unified decision-support framework designed to optimize the "Last Mile" of Aadhaar administration. By integrating Predictive Analytics, Geospatial Intelligence, and Automated Enforcement, it addresses critical inefficiencies in mobile camp allocation and fraud detection.
- 🚨 The Problem
- 💡 The Solution
- 🛠️ Tech Stack & Architecture
- 📊 Key Modules
- ⚙️ Installation & Setup
- 🚀 Usage Guide
- 📂 Project Structure
Despite massive saturation, Aadhaar administration faces three critical operational bottlenecks:
- The "MBU" Gap: Children turning 5 or 15 often miss Mandatory Biometric Updates because centers aren't available locally when demand peaks.
- Operational Blind Spots: Illegal mass-update operations ("Fraud Factories") often go undetected until it is too late due to a lack of real-time geospatial surveillance.
- Infrastructure Instability: Seasonal surges (e.g., school admission months) cause server crashes due to a lack of predictive load forecasting.
We moved beyond simple dashboards to build a Command Center that prescribes action:
- Predictive Logistics: Instead of waiting for queues, we forecast demand at the Pincode Level using HistGradientBoosting and deploy mobile vans beforehand.
- Automated Vigilance: A statistical "Fraud Radar" detects anomalous spikes (>5x average) and auto-generates digital Raid Warrants.
- Field Actionability: The system outputs downloadable Field Operations Guides (CSV) for van drivers, mapping high-demand zones to specific Government Schools.
| Component | Technology | Description |
|---|---|---|
| ETL Pipeline | Python (Pandas) | Custom "Platinum Merge" strategy to handle 1.5M+ rows with Inner Joins. |
| ML Core | Scikit-Learn | HistGradientBoostingRegressor for O(n) speed and native NaN handling. |
| Dashboard | Streamlit | Production-grade UI with multi-page navigation and PDF generation. |
| Visualization | Plotly Mapbox | Interactive geospatial heatmaps for district-level drill-downs. |
| Reporting | FPDF | Auto-generation of official Raid Warrants. |
- Input: Historical Enrollment (Leading Indicator) + Demographics.
- Output: A downloadable "Camp Deployment Schedule".
- Feature: Automatically maps high-demand pincodes to nearest Schools using geospatial data.
- Logic: Calculates rolling averages for every district. Flags pincodes with >500% activity spikes.
- Action: Generates a PDF Warrant containing the operator ID, location, and evidence of the spike.
- Note: Uses a simulation engine with synthetic data to demonstrate detection capabilities on anonymized logs.
- Goal: Prevent server crashes during Q2 (School Admission Season).
- Visual: Time-series forecasting of
total_activity(Enrollment + Updates) to predict server stress weeks in advance.
Prerequisites: Python 3.8+
- Clone the Repository
git clone https://github.com/vishalbarai007/UIDAI-Datathon.git
cd UIDAI-Datathon
- Create Virtual Environment
# Windows
python -m venv venv
venv\Scripts\activate
# Mac/Linux
python3 -m venv venv
source venv/bin/activate
- Install Dependencies
pip install -r requirements.txt
- Run the ETL & Training Pipeline (Optional - Pre-trained models included)
# 1. Clean & Aggregate Data
python src/etl/data_prep_v2.py
# 2. Train the AI Model
python src/modeling/train_final_optimized.py
Launch the Dashboard:
streamlit run src/dashboard/dashboard_v6.py
Navigation:
- Logistics Tab: Select "Bihar" -> "Patna". Check the "Action Plan" tab to download the Camp Schedule.
- Fraud Radar: Select "West Bengal". Look for Red dots. Click "Generate Warrant" to download the PDF.
- Infra Tab: Observe the predicted load curve for June/July.
UIDAI-Datathon/
├── data/
│ ├── raw/ # Original UIDAI CSVs
│ ├── processed/ # Aggregated Master Data (master_data_final.csv)
│ └── models/ # Trained .pkl models (xgb_model_final.pkl)
├── src/
│ ├── dashboard/
│ │ ├── dashboard_v6.py # Main Application Entry Point
│ │ └── fraud_ai.py # Fraud Simulation & PDF Logic
│ ├── etl/
│ │ └── data_prep_v2.py # Aggregation Engine
│ └── modeling/
│ └── train_final_optimized.py # ML Training Script
├── requirements.txt
└── README.md
- Methodology: We utilize Log Transformations (
np.log1p) to handle the power-law distribution of Aadhaar data (Metro vs. Village variance). - Data Integrity: Our "Platinum Merge" ensures we only train on high-confidence rows where Enrollment, Biometric, and Demographic signals align.
Built with ❤️ for the UIDAI Datathon 2026.