UIDAI Aadhaar Data Analysis Project

Team ID: UIDAI 4732 | Hackathon 2026

📊 Dashboard Preview

🔗 Live Power BI Dashboard:

👥 Team Members

Birla Institute of Technology, Mesra Department of Quantitative Economics and Data Science

Name	ID
Rounak Kumar	IED/10026/22
Dhruv	IED/10017/22
Apurva Mishra	IED/10024/22

📖 Project Overview

This project processes approximately 44 lakh records of Aadhaar Enrolment and Update datasets provided by the National Informatics Centre (NIC).

The raw data was fragmented across multiple split CSVs with inconsistent schemas, noisy geographic identifiers, and duplicates. Our solution consolidates this into a single, analysis-ready source of truth and visualizes it via an interactive Power BI dashboard to identify operational gaps, regional disparities, and lifecycle transition trends.

🚩 Problem Statement

The raw Aadhaar operational datasets present several challenges that hinder direct analysis:

Schema Inconsistencies: Split files (Biometric, Demographic, Enrolment) have different column structures.
Noisy Geography: State and District names contain spelling variants (e.g., "Orissa" vs "Odisha", "Cuddapah" vs "YSR"), special characters, and casing issues.
Lack of Metrics: Raw data provides counts but lacks performance indicators like "Growth Rate" or "Transition Continuity."
Duplication: Repeated records exist at identical reporting granularities.

Objective: Construct a unified pipeline to clean, standardize, and enrich the data for district-level decision-making.

⚙️ The Approach: End-to-End Pipeline

We implemented a 7-step ETL (Extract, Transform, Load) pipeline using Python (Pandas) and Power Query.

graph TD
    A[Raw CSV Ingestion] -->|Concat Split Files| B(Schema Alignment)
    B -->|Standardize Age Cols| C{Consolidation}
    C --> D[Geographic Cleaning]
    D -->|Regex & Mapping| E[Aggregation & Deduping]
    E --> F[Feature Engineering]
    F --> G[Final Dashboard Model]

Pipeline Steps

1. Ingestion

Read split CSVs from:
- Biometric folder
- Demographic folder
- Enrolment folder

2. Schema Alignment

Columns renamed to canonical formats:

age_0_5 (Bal Aadhaar)
age_5_17 (Mandatory Biometric Updates)
age_18_greater (Adult Updates)

3. Consolidation

Merged all sources into a master staging table.

4. Geographic Cleaning

State Normalization

Removed numeric junk.
Fixed casing issues with "and".
Mapped common variants (example: & to and).

District Normalization

Applied a comprehensive correction dictionary to map legacy names to current administrative districts.
Example: Gurgaon to Gurugram.

5. Aggregation

Grouped by:
- Date
- State
- District
- Pincode
Purpose: remove duplicate records.

6. Metric Engineering

Calculated daily growth metrics.
Derived lifecycle and transition ratios.

7. Enrichment

Merged district-level performance bands back into the daily aggregated view.

🧹 Data Cleaning

A significant portion of execution focused on cleaning dirty text fields using vectorized string operations.

Example: Standardizing State Names

df["state"] = (
    df["state"]
    .astype("string")
    .str.strip()
    .str.replace(r"\s+", " ", regex=True)
    .str.replace("&", "and", regex=False)
    .str.lower()
)

Correction Mapping (Snippet)

correction_map = {
    "orissa": "odisha",
    "pondicherry": "puducherry",
    "allahabad": "prayagraj",
    "gurgaon": "gurugram",
    "cuddapah": "ysr"
}

df["district"] = df["district"].replace(correction_map)

🧮 Feature Engineering

Derived KPIs to evaluate district-level performance:

Metric	Formula	Purpose
Total Updates	`age_0_5 + age_5_17 + age_18_greater`	Primary workload measure
Zero Activity Flag	`IF(total_updates == 0, 1, 0)`	Identifies service interruptions
Transition Ratio	`total_adult_updates / (total_child_updates + 1)`	Measures lifecycle continuity
Priority Index	Transition Ratio < 1.5 OR Zero Days > 5	Flags districts needing attention

📊 Dashboard Architecture

The Power BI solution is divided into two analytical views.

1. Executive Summary and Activity

Goal
High-level monitoring of national and state trends.

Visuals

Choropleth map of update activity by state
Monthly growth rate trends
Activity status distribution (Increasing vs Declining)

2. District Performance and Priority Analysis

Goal
Deep dive into district-level operational gaps.

Visuals

Priority list of districts flagged as High Priority
Transition band donut chart (Low, Moderate, High continuity)
Zero activity tracker for frequent zero-reporting districts

🚀 Future Scope

Real-time API

Replace CSV dumps with direct UIDAI API integration.

Anomaly Detection

Use Isolation Forest to detect sudden drops in enrolment packets.

Census Overlay

Correlate Aadhaar saturation with Census 2021 and 2026 population data to estimate remaining demand.

🛠️ Tech Stack

Language: Python 3.10+
Libraries: pandas, numpy, regex
Visualization: Microsoft Power BI
Source Control: GitHub

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
images		images
README.md		README.md
UIDAI Dashboard.pbix		UIDAI Dashboard.pbix
UIDAI-CODE-SUBIMISSION.ipynb		UIDAI-CODE-SUBIMISSION.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UIDAI Aadhaar Data Analysis Project

📊 Dashboard Preview

👥 Team Members

📖 Project Overview

🚩 Problem Statement

⚙️ The Approach: End-to-End Pipeline

Pipeline Steps

1. Ingestion

2. Schema Alignment

3. Consolidation

4. Geographic Cleaning

State Normalization

District Normalization

5. Aggregation

6. Metric Engineering

7. Enrichment

🧹 Data Cleaning

Example: Standardizing State Names

Correction Mapping (Snippet)

🧮 Feature Engineering

📊 Dashboard Architecture

1. Executive Summary and Activity

2. District Performance and Priority Analysis

🚀 Future Scope

Real-time API

Anomaly Detection

Census Overlay

🛠️ Tech Stack

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

apooorv19/UIDAI-hackathon-2026

Folders and files

Latest commit

History

Repository files navigation

UIDAI Aadhaar Data Analysis Project

📊 Dashboard Preview

👥 Team Members

📖 Project Overview

🚩 Problem Statement

⚙️ The Approach: End-to-End Pipeline

Pipeline Steps

1. Ingestion

2. Schema Alignment

3. Consolidation

4. Geographic Cleaning

State Normalization

District Normalization

5. Aggregation

6. Metric Engineering

7. Enrichment

🧹 Data Cleaning

Example: Standardizing State Names

Correction Mapping (Snippet)

🧮 Feature Engineering

📊 Dashboard Architecture

1. Executive Summary and Activity

2. District Performance and Priority Analysis

🚀 Future Scope

Real-time API

Anomaly Detection

Census Overlay

🛠️ Tech Stack

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages