Skip to content

STIWARTs/UIDAI_DH_2k26

Repository files navigation

UIDAI_Challenge

UIDAI Logo

MBU Gap Analyzer

Predictive Gap Analysis Engine for Mandatory Biometric Updates

UIDAI Data Hackathon 2026 Submission

Problem β€’ Solution β€’ Features β€’ Methodology β€’ Visualizations β€’ Installation β€’ Tech Stack β€’ Team

Python Pandas Plotly Scikit-Learn PyTorch SHAP

MAIN Deep Learning


πŸ“‹ Table of Contents


🎯 Problem Statement

The Hidden Crisis: Children Losing Benefits Due to MBU Non-Compliance

Children enrolled in Aadhaar are mandated to update their biometrics at two critical life stages:

Age Update Type Reason
5 years Mandatory Biometric Update Fingerprints mature, facial features change
15 years Mandatory Biometric Update Adolescent biometric changes

What Happens If They Don't Update?

❌ Aadhaar becomes INACTIVE
   β”œβ”€β”€ 🚫 School admission blocked
   β”œβ”€β”€ 🚫 Scholarship disbursement fails
   β”œβ”€β”€ 🚫 Mid-day meal authentication fails
   └── 🚫 DBT (Direct Benefit Transfer) denied

The Scale of the Problem

Thousands of children risk losing government benefits worth crores of rupees annually because their Aadhaar wasn't updated on time β€” often due to lack of awareness or inaccessible update centers.


πŸ’‘ Our Solution

MBU Gap Analyzer: A 3-Layer Predictive Engine

We built an intelligent system that identifies "Ghost Cohorts" β€” children who were enrolled in Aadhaar but never completed their Mandatory Biometric Updates.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    MBU GAP ANALYZER                             β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                 β”‚
β”‚   Layer 1: COHORT TRACKING                                      β”‚
β”‚   β”œβ”€β”€ Track children from enrolment β†’ MBU age                   β”‚
β”‚   β”œβ”€β”€ Calculate MBU Compliance Ratio per district               β”‚
β”‚   └── Identify "Ghost Cohorts" with high gap                    β”‚
β”‚                                                                 β”‚
β”‚   Layer 2: SERVICE DESERT IDENTIFICATION                        β”‚
β”‚   β”œβ”€β”€ K-Means clustering on district performance                β”‚
β”‚   β”œβ”€β”€ Identify underserved areas                                β”‚
β”‚   └── Priority ranking for intervention                         β”‚
β”‚                                                                 β”‚
β”‚   Layer 3: SCHOLARSHIP RISK PREDICTION                          β”‚
β”‚   β”œβ”€β”€ Time series analysis of update trends                     β”‚
β”‚   β”œβ”€β”€ Forecast "Update Crunch" periods                          β”‚
β”‚   └── Predict children at risk before admission season          β”‚
β”‚                                                                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

✨ Key Features

πŸ“Š Cohort Tracking

  • Aggregates 5M+ records across enrolment, biometric, and demographic data
  • Computes district-wise MBU Compliance Ratio
  • Risk classification: Green / Yellow / Red zones

πŸ—ΊοΈ Service Desert Mapping

  • K-Means clustering identifies underserved districts
  • Silhouette analysis for optimal cluster selection
  • Priority list for Aadhaar Seva Kendra deployment

πŸ“ˆ Predictive Forecasting

  • 6-month ahead predictions using Prophet/Linear Regression
  • Identifies "Update Crunch" before school admission season
  • Confidence intervals for resource planning

πŸ’° Impact Quantification

  • Calculates scholarships at risk (Rs. Crore)
  • DBT benefits potentially blocked
  • Actionable financial impact for policymakers

πŸš€ Deep Learning Edition

Branch: deep-learning | Notebook: MBU_Gap_Analyzer_DeepLearning.ipynb

We've enhanced the standard ML solution with 4 cutting-edge Deep Learning modules to deliver a production-ready, hackathon-winning submission.

Module Overview

# Module Technology Purpose
1 LSTM Forecaster PyTorch Time-series prediction for MBU demand during school admission rush
2 Autoencoder Anomaly PyTorch Unsupervised detection of suspicious districts with abnormal patterns
3 SHAP Explainability SHAP Library Explain WHY K-Means classified districts as Service Deserts
4 Geospatial Map Folium Interactive India map with color-coded Service Desert markers

🧠 Module 1: LSTM Time-Series Forecaster

Architecture: Input β†’ LSTM (2 layers, 64 hidden) β†’ FC β†’ Output

LSTM (Long Short-Term Memory) captures temporal patterns in biometric update trends to predict future MBU demand.

$$h_t = o_t \odot \tanh(C_t)$$

  • Training: 100 epochs with Adam optimizer + learning rate scheduling
  • Output: 6-month forecast to anticipate school admission rush (June-July 2026)
  • Use Case: UIDAI can pre-deploy mobile Seva Kendras to high-demand districts

πŸ” Module 2: Autoencoder Anomaly Detector

Encoder: 5 features β†’ 32 β†’ 16 β†’ 8 (bottleneck)
Decoder: 8 β†’ 16 β†’ 32 β†’ 5 features (reconstruction)

Autoencoder learns normal patterns and flags districts with high reconstruction error as anomalies.

$$\text{Anomaly if } \mathcal{L}_{reconstruction} > \mu + 2\sigma$$

  • Detection: Districts with abnormal enrolment-to-update ratios
  • Use Case: Identify data quality issues or potential fraud

πŸ“Š Module 3: SHAP Explainable AI

SHAP (SHapley Additive exPlanations) uses game theory to explain model decisions.

  • Method: KernelSHAP for K-Means clustering
  • Output: Feature importance showing why a district is classified as Service Desert
  • Use Case: Provide transparent, auditable explanations for policy decisions

πŸ“Š Deep Learning Visualizations

Model Benchmarking: LSTM vs Baselines

Model Benchmarking

Key Result: LSTM reduces RMSE by 1.7% vs Linear Regression and 17% vs Moving Average

LSTM Training Progress

LSTM Training Loss

Training converges smoothly over 100 epochs with learning rate scheduling

LSTM 6-Month Forecast

LSTM Forecast

LSTM predicts MBU demand surge during school admission season (June-July 2026)

πŸ—ΊοΈ Module 4: Interactive Geospatial Map

Folium generates an interactive HTML map of India with:

Marker Color Status Compliance
πŸ”΄ Red Service Desert < 50%
🟠 Orange At Risk 50-80%
🟒 Green Compliant > 80%
  • Popups: District name, compliance ratio, MBU gap
  • Output: analysis_outputs/service_desert_map.html

Quick Start (Deep Learning Edition)

# Switch to deep-learning branch
git checkout deep-learning

# Install additional dependencies
pip install torch shap folium

# Run the notebook
jupyter notebook MBU_Gap_Analyzer_DeepLearning.ipynb

πŸ—οΈ Architecture

Core Algorithm: MBU Compliance Ratio

$$\text{MBU Compliance Ratio} = \frac{\text{Biometric Updates (Age 5-17)}}{\text{Eligible Population (Enrolments Age 0-5 + Age 5-17)}} \times 100$$

Risk Classification Matrix

Compliance Ratio Risk Category Action Required
β‰₯ 80% 🟒 Green (Compliant) Maintain current operations
50% - 80% 🟑 Yellow (Moderate Risk) Awareness campaigns needed
< 50% πŸ”΄ Red (Service Desert) Urgent intervention required

Machine Learning Pipeline

Input Data                    Processing                      Output
─────────────────────────────────────────────────────────────────────
                             β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
[Enrolment CSV] ────────────►│              β”‚
                             β”‚   Merge &    β”‚     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
[Biometric CSV] ────────────►│  Aggregate   │────►│ Gap Analysis DF β”‚
                             β”‚   by Dist.   β”‚     β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
[Demographic CSV] ──────────►│              β”‚              β”‚
                             β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜              β–Ό
                                                  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                                  β”‚ StandardScaler  β”‚
                                                  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                           β–Ό
                                                  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                                  β”‚ K-Means (k=4)   β”‚
                                                  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                           β–Ό
                                              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                              β”‚  Cluster Assignment    β”‚
                                              β”‚  β€’ Service Desert      β”‚
                                              β”‚  β€’ At Risk             β”‚
                                              β”‚  β€’ Moderate            β”‚
                                              β”‚  β€’ High Performer      β”‚
                                              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“Š Key Findings

1,070+

Districts Analyzed

177,900

Children at Risk

147

Service Desert Districts

Rs. 32+ Cr

Benefits at Risk

District Risk Distribution

Green (Compliant)     β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  930 districts (87%)
Yellow (Moderate)     β–ˆβ–ˆ                                         32 districts (3%)
Red (Service Desert)  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                                    147 districts (14%)

πŸ“ˆ Visualizations

Our analysis generates 8 interactive visualizations for the hackathon submission:

# Chart Purpose
1 Problem Districts by State Identify states with most Service Desert + Moderate Risk districts
2 MBU Gap Treemap Visualize children at risk by state (size = gap, color = severity)
3 MBU Gap by State Children at risk of losing benefits per state
4 Optimal Cluster Selection Elbow Method + Silhouette Score for K-Means tuning
5 Service Desert Scatter Plot K-Means clustering with district-level risk categories
6 Daily Update Trend Time series with 7-day moving average
7 Forecast Chart Prophet-based 6-month prediction for school admission season
8 Financial Impact Bar Scholarships & DBT benefits at risk (Rs. Crore)

Chart Gallery

1. Problem Districts by State

Problem Districts

2. MBU Gap Treemap

MBU Gap Treemap

3. MBU Gap by State (Children at Risk)

MBU Gap by State

4. Optimal Cluster Selection (Elbow + Silhouette)

Elbow Silhouette

5. Service Desert Identification (K-Means Clustering)

Service Desert Scatter

6. Daily Biometric Update Trends

Daily Trends

7. MBU Forecast: Predicting the Update Crunch

Forecast

8. Financial Impact of MBU Non-Compliance

Financial Impact

πŸ“ All visualizations are also available as interactive HTML files in analysis_outputs/

πŸ—ΊοΈ Interactive Service Desert Map

Interactive Service Desert Map - Click to explore

πŸ”— Click the map above or here to view the Interactive India Map

Explore the Folium-powered geospatial visualization showing Service Desert districts across India:

  • πŸ”΄ Red markers = Service Desert districts (< 50% compliance)
  • 🟑 Yellow markers = Moderate Risk districts (50-80%)
  • 🟒 Green markers = Compliant districts (> 80%)
  • Click markers for district-level details (MBU Gap, Compliance %)

πŸš€ Installation

Prerequisites

  • Python 3.10 or higher
  • pip package manager
  • Git

Quick Setup

# 1. Clone the repository
git clone https://github.com/STIWARTs/UIDAI_DH_2k26.git
cd UIDAI_DH_2k26

# 2. Create virtual environment
python -m venv .venv

# 3. Activate virtual environment
# Windows (PowerShell)
.venv\Scripts\Activate.ps1
# Windows (CMD)
.venv\Scripts\activate.bat
# Linux/Mac
source .venv/bin/activate

# 4. Install all dependencies
pip install -r requirements.txt

What's Included in requirements.txt

Category Packages
Core Data Science numpy, pandas, scipy
Machine Learning scikit-learn, shap
Deep Learning torch (PyTorch)
Time Series prophet
Visualization matplotlib, plotly, folium, kaleido
Jupyter ipykernel, ipython, jupyter_client

⚑ Note: The full installation may take 5-10 minutes due to PyTorch and Prophet dependencies.

Deep Learning Edition Dependencies

pip install torch shap folium kaleido

πŸ“– Usage

Run the Analysis

Standard ML Edition (MAIN branch)

  1. Open Jupyter Notebook

    jupyter notebook MBU_Gap_Analyzer.ipynb
  2. Execute All Cells

    • Press Shift + Enter to run cells sequentially
    • Or use Cell β†’ Run All from menu
  3. View Outputs

    • Interactive charts display inline
    • CSV exports saved to analysis_outputs/ folder
    • HTML charts for PDF conversion

πŸš€ Deep Learning Edition (deep-learning branch)

  1. Switch to deep-learning branch

    git checkout deep-learning
  2. Install Deep Learning dependencies

    pip install torch shap folium
  3. Open the Deep Learning notebook

    jupyter notebook MBU_Gap_Analyzer_DeepLearning.ipynb
  4. Execute All Cells - Includes 4 advanced AI modules:

    • Module 1: LSTM Time-Series Forecaster
    • Module 2: Autoencoder Anomaly Detector
    • Module 3: SHAP Explainable AI
    • Module 4: Folium Geospatial Map

Output Files

analysis_outputs/
β”œβ”€β”€ mbu_gap_analysis_by_district.csv    # Full gap analysis
β”œβ”€β”€ state_wise_compliance_summary.csv   # State-level metrics
β”œβ”€β”€ service_desert_districts.csv        # Priority intervention list
β”œβ”€β”€ ghost_cohort_districts.csv          # Top 20 underperforming districts
β”œβ”€β”€ chart1_state_compliance.html        # Interactive visualizations
β”œβ”€β”€ chart2_gap_treemap.html
β”œβ”€β”€ chart3_service_desert_scatter.html
β”œβ”€β”€ chart4_daily_trends.html
β”œβ”€β”€ chart5_forecast.html
β”œβ”€β”€ chart6_financial_impact.html
└── service_desert_map.html             # πŸ†• Interactive Folium map (Deep Learning Edition)

πŸ“ Dataset Structure

uidai_datasets/
β”œβ”€β”€ api_data_aadhar_enrolment/          # ~1M records
β”‚   β”œβ”€β”€ api_data_aadhar_enrolment_0_500000.csv
β”‚   β”œβ”€β”€ api_data_aadhar_enrolment_500000_1000000.csv
β”‚   └── api_data_aadhar_enrolment_1000000_1006029.csv
β”‚
β”œβ”€β”€ api_data_aadhar_biometric/          # ~1.8M records
β”‚   β”œβ”€β”€ api_data_aadhar_biometric_0_500000.csv
β”‚   β”œβ”€β”€ api_data_aadhar_biometric_500000_1000000.csv
β”‚   β”œβ”€β”€ api_data_aadhar_biometric_1000000_1500000.csv
β”‚   └── api_data_aadhar_biometric_1500000_1861108.csv
β”‚
└── api_data_aadhar_demographic/        # ~2M records
    β”œβ”€β”€ api_data_aadhar_demographic_0_500000.csv
    β”œβ”€β”€ api_data_aadhar_demographic_500000_1000000.csv
    β”œβ”€β”€ api_data_aadhar_demographic_1000000_1500000.csv
    β”œβ”€β”€ api_data_aadhar_demographic_1500000_2000000.csv
    └── api_data_aadhar_demographic_2000000_2071700.csv

Data Schema

Dataset Columns
Enrolment date, state, district, pincode, age_0_5, age_5_17, age_18_greater
Biometric date, state, district, pincode, bio_age_5_17, bio_age_17_
Demographic date, state, district, pincode, demo_age_5_17, demo_age_17_

πŸ› οΈ Tech Stack

Python
Python 3.10+
Core Language
Pandas
Pandas
Data Processing
NumPy
NumPy
Numerical Computing
Plotly
Plotly
Visualizations
Scikit-Learn
Scikit-Learn
ML Clustering
PyTorch
PyTorch
Deep Learning
SHAP
SHAP
Explainable AI
Folium
Folium
Geospatial Maps
Prophet
Prophet
Time Series
Kaleido
Kaleido
Chart Export
Jupyter Notebook
Jupyter Notebook
Interactive Development

πŸ“‹ Policy Recommendations

Based on our analysis, we recommend the following interventions:

# Recommendation Impact
1 Deploy Mobile Aadhaar Seva Kendras in Service Desert districts Reach underserved rural/tribal areas
2 Launch SMS/IVR Reminder Campaigns for children turning 5/15 Proactive awareness
3 Integrate MBU Status Check with school admission portals Flag inactive Aadhaar early
4 Extend Fee Waiver beyond Oct 2025 for low-compliance states Remove financial barrier
5 Deploy Real-time Dashboard for District Collectors Weekly progress monitoring

πŸ‘₯ Team

Team OMEGA

UIDAI Data Hackathon 2026

Name Role LinkedIn
Piyush Verma Team Leader LinkedIn
Stiwart Stance Saxena Team Member LinkedIn

πŸ“„ License

This project is submitted as part of the UIDAI Data Hackathon 2026. All rights reserved.


Built with ❀️ for Digital India

Making Aadhaar work for every child

About

Predictive Gap Analysis Engine for Mandatory Biometric Updates

Topics

Resources

Stars

Watchers

Forks

Contributors 2

  •  
  •