Pandas Practice & Learning Repository

1. Project Overview

This repository serves as a comprehensive, structured guide to mastering Pandas for data manipulation and analysis. It is designed effectively as a "Zero to Hero" roadmap, covering everything from basic Series creation to memory optimization and Exploratory Data Analysis (EDA) on real-world datasets.

The core of this project is the notebooks/pandas_practice.ipynb Jupyter Notebook, which is organized into logical, progressive modules.

2. Why This Repository Exists

Data Engineering and Data Science interviews often focus heavily on data manipulation skills. While many tutorials exist, few focus on industry best practices, such as:

Vectorized operations over loops.
Explicit indexing (.loc vs .iloc).
Proper handling of missing data (NaN).
Memory optimization strategies.

This repository bridges the gap between basic syntax and professional application.

3. What You Will Learn

By working through this repository, you will master:

Core Structures: Deep dive into Series and DataFrames.
Data Cleaning: Handling missing values, duplicates, and string manipulation.
Advanced Selection: Boolean masking, querying, and conditional logic.
Aggregation: GroupBy, pivoting, and statistical summaries.
Performance: Writing efficient, vectorized Pandas code.
EDA: Applying these skills to analyze a real Spam/Ham dataset.

4. Repository Structure

pandas-practice/
│
├── notebooks/
│   └── pandas_practice.ipynb   # Main learning notebook
│
├── data/
│   └── spam.csv                # Real-world dataset for EDA
│
├── README.md                   # Project documentation
├── requirements.txt            # Python dependencies
└── .gitignore                  # Git configuration

5. Installation and Setup

Prerequisites

Python 3.8+
Git

6. How to Clone and Run

Clone the repository

git clone https://github.com/<username>/pandas-practice.git
cd pandas-practice

Install dependencies It is recommended to use a virtual environment.
```
pip install -r requirements.txt
```
Launch Jupyter Notebook
```
jupyter notebook
```
Open notebooks/pandas_practice.ipynb to begin.

7. Dataset Information

The project includes data/spam.csv, a classic dataset for text classification.

Content: SMS messages labelled as 'spam' or 'ham' (legitimate).
Usage: Used in Section 12 & 13 to demonstrate file reading, cleaning, and Exploratory Data Analysis.

8. Notes on Pandas Best Practices

Avoid Loops: Always look for a vectorized solution first.
Be Explicit: Use .loc and .iloc instead of relying on ambiguous [] indexing.
Chain Method: Use method chaining (e.g., df.query().groupby().mean()) for readable code, but don't overdo it.
Copy vs View: Be aware of SettingWithCopyWarning. Use .copy() when creating a new dataframe from a subset.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Pandas Practice & Learning Repository

1. Project Overview

2. Why This Repository Exists

3. What You Will Learn

4. Repository Structure

5. Installation and Setup

Prerequisites

6. How to Clone and Run

7. Dataset Information

8. Notes on Pandas Best Practices

About

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
data		data
notebooks		notebooks
.gitignore		.gitignore
Readme.md		Readme.md
requirements.txt		requirements.txt

aarogyaojha/pandas-python

Folders and files

Latest commit

History

Repository files navigation

Pandas Practice & Learning Repository

1. Project Overview

2. Why This Repository Exists

3. What You Will Learn

4. Repository Structure

5. Installation and Setup

Prerequisites

6. How to Clone and Run

7. Dataset Information

8. Notes on Pandas Best Practices

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages