Why This Project Stands Out: A complete end-to-end data analytics solution integrating Python, SQL, and Power BI to uncover actionable insights into retail shopping patterns.
It transforms raw customer transaction data into a visually interactive dashboard revealing spending habits, loyalty behavior, and revenue opportunities.
This project analyzes customer shopping behavior using transactional data from 3,900 purchases across multiple product categories.
The goal is to identify spending patterns, customer segments, and product preferences while providing recommendations to increase subscriptions, loyalty, and revenue.
The project demonstrates the entire analytics lifecycle:
- Data cleaning and transformation in Python
- Business analysis and querying in PostgreSQL
- Interactive visualization and storytelling in Power BI
- Perform data preparation, cleaning, and feature engineering using Python.
- Execute SQL queries for business analysis and segmentation.
- Design an interactive Power BI dashboard for decision-making insights.
- Provide data-driven recommendations to improve marketing, customer retention, and profitability.
| Dataset/Table | Description |
|---|---|
Customer |
Customer demographics — Age, Gender, Location, Subscription Status |
Purchase |
Product information — Item, Category, Season, Size, Color, Price |
Transaction |
Purchase details — Discount Applied, Promo Code, Frequency, Review Rating, Shipping Type |
Rows: 3,900 Columns: 18 Missing Data: 37 null values in Review Rating
- Demographics: Age, Gender, Location, Subscription Status
- Purchase Details: Item, Category, Amount, Season, Size, Color
- Behavioral Data: Discount usage, Promo codes, Purchase frequency, Shipping preference
- Data Loading & Cleaning: Imported dataset using
pandasand explored using.info()and.describe(). - Missing Value Treatment: Imputed missing
Review Ratingusing median per product category. - Feature Engineering:
- Created
age_groupfrom customer ages. - Derived
purchase_frequency_daysfrom purchase history.
- Created
- Column Standardization: Converted to
snake_casefor clarity. - Database Integration: Loaded cleaned data into PostgreSQL for SQL analysis.
- SQL Analysis: Derived insights on revenue, segments, and behavior.
- Visualization: Built an interactive Power BI dashboard to present results visually.
| Analysis | Objective |
|---|---|
| Revenue by Gender | Compare total revenue generated by male vs. female customers |
| High-Spending Discount Users | Identify customers using discounts but spending above average |
| Top 5 Products by Rating | Discover products with the highest review ratings |
| Shipping Type Comparison | Compare spending between Standard and Express shipping |
| Subscribers vs. Non-Subscribers | Measure revenue and average spend by subscription status |
| Discount-Dependent Products | Identify products most purchased with discounts |
| Customer Segmentation | Categorize customers as New, Returning, or Loyal |
| Top Products per Category | List top-selling items within each category |
| Repeat Buyers & Subscriptions | Analyze relationship between frequent buyers and subscription status |
| Revenue by Age Group | Evaluate revenue contribution across age groups |
An interactive Customer Behavior Dashboard was developed in Power BI to showcase major KPIs, including revenue, subscription trends, and customer demographics.
- 3.9K Customers analyzed
- $59.76 average purchase amount
- 3.75 average review rating
- Subscription Status: Yes – 27% | No – 73%
- Visual comparisons by Category, Age Group, Gender, and Shipping Type
- Subscribers spend more per purchase compared to non-subscribers.
- Clothing and Accessories drive the highest sales and revenue.
- Young Adults (25–35) are the most active shoppers.
- Express Shipping correlates with higher purchase amounts.
- Top-rated products show stronger repeat purchase behavior.
| Focus Area | Recommendation |
|---|---|
| Subscription Growth | Offer exclusive rewards and early access to boost subscription rate. |
| Customer Loyalty | Introduce points or referral-based loyalty programs. |
| Discount Policy | Review discount strategy to sustain profit margins. |
| Product Positioning | Promote top-rated and best-selling items in marketing campaigns. |
| Targeted Marketing | Focus on high-spending age groups and express-shipping users. |
| Challenge | Solution |
|---|---|
| Missing review ratings | Imputed using median rating per product category |
| Redundant data | Dropped promo_code_used after redundancy check |
| SQL performance | Used CTEs and aggregation for optimized execution |
| Visual consistency | Applied unified color palette and structured Power BI layout |
| Tool / Technology | Purpose |
|---|---|
| Python (pandas, numpy, matplotlib, seaborn) | Data cleaning and exploration |
| PostgreSQL | Database management and SQL analysis |
| Power BI Desktop | Dashboard visualization and reporting |
| Jupyter Notebook | Code execution and documentation |
This project highlights how Python, SQL, and Power BI can work together to transform raw retail data into meaningful business intelligence.
It demonstrates practical skills in data engineering, analytics, and visualization, ultimately driving informed decisions around marketing, loyalty, and product performance.
