Skip to content

HeatTransfer/pan-number-validation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🆔 PAN Number Validation Project

A Python-based data cleaning and validation project that ensures the accuracy and integrity of Permanent Account Numbers (PAN) for Indian nationals.
The goal is to check that each PAN follows the official format and to categorize it as Valid or Invalid.


📌 Project Overview

This project takes an input dataset containing PAN numbers (from an CSV/Excel file), performs cleaning and preprocessing, validates each PAN based on official rules, and outputs:

  • A list of PAN Numbers marked with Valid & Invalid
  • A list of Invalid PAN Categories
  • A summary report with counts

🛠 Features

  1. Data Cleaning & Preprocessing

    • Handles missing values (removal or imputation).
    • Removes duplicate PAN numbers.
    • Strips leading/trailing spaces.
    • Converts all PAN numbers to uppercase.
  2. PAN Format Validation Rules

    • Exactly 10 characters long.
    • Format: AAAAA1234A
      • First 5 characters: uppercase alphabets.
      • No consecutive identical alphabets (e.g., AABCD ❌).
      • Not a sequential alphabet series (e.g., ABCDE ❌).
      • Next 4 characters: digits.
      • No consecutive identical digits (e.g., 1123 ❌).
      • Not a sequential digit series (e.g., 1234 ❌).
      • Last character: uppercase alphabet.
  3. Categorization

    • Valid: Meets all format rules.
    • Invalid: Violates any rule or contains non-alphanumeric characters.
    • Observation: Which category of invalid format it falls (blank if valid)
  4. Reporting

    • Total records processed.
    • Total valid PANs.
    • Total invalid PANs.
    • Total missing/incomplete PANs.
    • Categorization of invalid PANs.

📂 Project Structure

.
├── resources/
│   └── PAN Number Validation Dataset.csv               # Input dataset (csv)
│   └── PAN Number Validation Dataset.xlsx              # Input dataset (xlsx)
│   └── PAN Number Validation - Problem Statement.pdf   # Input dataset (xlsx)
├── analysis_raw.ipynb                                  # Raw analysis file
├── analysis_final.ipynb                                # Ready to run python script
├── README.md                                           # Project documentation
└── output/
    ├── PAN_Validation_Results.xlsx
    ├── PAN_Validation_Summary.xlsx
    └── Valid_Invalid_Category.xlsx

🚀 Getting Started

1️⃣ Clone the Repository

    git clone https://github.com/<your-username>/pan-number-validation.git

2️⃣ Place Your Dataset

  • Put your PAN Number Validation Dataset.csv file inside the resources/ folder.

3️⃣ Run the Script

  • analysis_final.ipynb

🧰 Tech Stack

  • Python (pandas, re)
  • Excel/CSV for input/output

✍ Author

Shreyajyoti Dutta 🔗 LinkedIn Profile 📫 Open to opportunities in Data Analytics, Data Engineering, ETL and BI


🏷️ Tags

Python pandas Data Cleaning Data Preprocessing Data Transformation Business Insights Data Analytics

About

A python data cleanup and pre-processing project that validates a bunch of PAN Numbers of Indian Nationals collected from source and segregates them into valid and invalid categories and further into invalid sub-categories as per government rule.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors