This project performs an inventory data analysis on product listings from Zepto using Python. The analysis includes data cleaning, exploratory data analysis (EDA), and visualizations to uncover insights related to product pricing, discounts, and stock availability.
The dataset contains detailed product-level data, including:
- Category
- Product name
- MRP and Discounted Price (in paise)
- Discount Percent
- Available Quantity
- Weight (in grams)
- Stock Status (In Stock or Out of Stock)
- Quantity per pack
Key steps in data cleaning:
- Checked shape, info, and null values using .shape, .info(), .isnull().sum()
- Removed rows with MRP = 0 and weight = 0
- Converted columns from paise to rupees
- mrp, discountedSellingPrice = divided by 100
- Identified and removed duplicate categories like:
- Personal Care = Paan Corner
- Cooking Essentials = Munchies
- Ice Cream & Desserts = Chocolates & Candies
- Dairy, Bread & Butter = Beverages
- Dropped exact duplicates using .drop_duplicates()
- Bar chart showing product names vs. discount %
- Pie chart showing count of products that are in stock vs out of stock
- Bar chart identifying luxury/small items (like cosmetics)
- Bar chart of final price (in ₹)
- Line chart showing trend between discount % and average final price
This analysis revealed that while a majority of products are well-stocked, heavy discounts are mostly offered on low-cost everyday ready-to-eat items like wafers and liquid masalas to attract more customers. On the other hand, luxury or premium products like saffron and skincare items have a significantly higher price per gram, indicating niche value. The relationship between discount percentage and final price is not linear—high discounts do not always mean high-value savings, as they are often applied to lower-priced products.
- Python
- Pandas – data cleaning and transformation
- Matplotlib – data visualization
- Jupyter Notebook – code and analysis
This project helped explore:
- Discount patterns
- Inventory stock status
- Product pricing behavior
- Data cleaning on real-world messy data
- Clone this repo
- Open the Jupyter Notebook
- Install required packages:
pip install pandas matplotlib
- Run all cells to see the full analysis