The objective is to get a holistic overview of solving a business problem through analytics and to set up the foundations of the skills required to work with data, delivered via the foundations of Python.
- Overview
- Case Studies
- Skills Covered
- Technologies Used
- Project Structure
- Getting Started
- Reference Material
This repository contains hands-on case studies and projects designed to build a strong foundation in Python for data analysis. Each case study focuses on real-world business scenarios, helping you develop practical skills in:
- Data manipulation and cleaning
- Exploratory Data Analysis (EDA)
- Statistical analysis
- Data visualization
- Deriving actionable business insights
Location: case-studies/tips/
Analyze tipping patterns at Chef's Kitchen restaurant in San Diego to understand customer behavior and identify trends in revenue and tips across different demographics.
| Aspect | Details |
|---|---|
| Business Domain | Restaurant / Hospitality |
| Dataset Size | 244 records, 8 features |
| Key Variables | total_bill, tip, day, time, size, smoker, sex |
| Analysis Focus | Tipping behavior, customer demographics, time-based patterns |
Key Questions Answered:
- What is the relationship between bill amount and tip?
- How do tips vary by day of the week and time of day?
- Does gender or smoking status affect tipping behavior?
- How does group size impact tipping patterns?
Location: case-studies/food-hub/
Analyze order data from FoodHub, a food aggregator company in New York, to understand restaurant demand and enhance customer experience.
| Aspect | Details |
|---|---|
| Business Domain | Food Delivery / E-commerce |
| Dataset Size | Large-scale order data |
| Key Variables | order_id, restaurant_name, cuisine_type, cost, rating, delivery_time |
| Analysis Focus | Restaurant performance, delivery efficiency, customer satisfaction |
Key Questions Answered:
- Which restaurants and cuisine types are most popular?
- How do order costs vary across different restaurants?
- What factors affect delivery time and customer ratings?
- Are there patterns in weekday vs weekend orders?
Location: case-studies/honey-production/
Explore the decline of honey production in the United States from 1998 to 2016, investigating the impact of Colony Collapse Disorder (CCD) and analyzing trends in production, pricing, and state-level performance.
| Aspect | Details |
|---|---|
| Business Domain | Agriculture / Environmental Science |
| Dataset Size | 786 records, 8 features (19 years: 1998-2016) |
| Key Variables | state, numcol, yieldpercol, totalprod, stocks, priceperlb, prodvalue, year |
| Analysis Focus | Production trends, colony decline, pricing dynamics, state-level patterns |
Key Questions Answered:
- How has honey production yield changed from 1998 to 2016?
- What are the major production trends across states over time?
- Are there patterns between total honey production and value of production each year?
- Which states are the largest honey producers and which produce the most expensive honey?
Key Findings:
- π Overall honey production in the US has been decreasing over the years
- π Decline attributed to both decreasing colonies AND decreasing yield per colony
- π Top producers: North Dakota, California, South Dakota, Florida, Montana
- π° Virginia produces the costliest honey; Oklahoma produces the cheapest
Location: case-studies/google-play-store/
Analyze Google Play Store data for Zoom Ads, an advertising agency looking to identify trending Android applications for targeted advertisement promotion to maximize profit.
| Aspect | Details |
|---|---|
| Business Domain | Digital Advertising / Mobile App Market |
| Dataset Size | App store data with 12 features |
| Key Variables | App, Category, Rating, Reviews, Size, Installs, Price, Content Rating, Ad Supported |
| Analysis Focus | App trends, market analysis, advertising opportunities, user engagement patterns |
Context:
Android is the mobile operating system running on Google OS with about 69% of the market share worldwide. The Google Play Store is the Android app store used to install Android Apps. Zoom Ads wants to understand app trends to focus advertising efforts on applications that are trending and can lead to maximum profit.
Key Questions Answered:
- Which app categories are most popular on the Google Play Store?
- What is the relationship between app ratings and number of installs?
- How do free vs paid apps compare in terms of user engagement?
- Which apps support advertisements and have high user engagement?
- What content ratings attract the most users?
Analysis Guidelines:
- π Univariate analysis to understand individual variable distributions
- π Bivariate analysis to explore correlations between variables
- π Visualizations to extract actionable insights for advertising strategy
Data Features:
| Feature | Description |
|---|---|
App |
Application Name |
Category |
Category the app belongs to |
Rating |
Overall user rating of the app |
Reviews |
Number of user reviews for the app |
Size |
Size of the app in kilobytes |
Installs |
Number of user downloads/installs for the app |
Price |
Price of an app in dollars |
Paid/Free |
Whether an app is paid or free (Yes/No) |
Content Rating |
Age group the app is targeted at |
Ad Supported |
Whether an app supports an Ad or not (Yes/No) |
In App Purchases |
App containing in-app purchase feature or not (Yes/No) |
Editors Choice |
Whether rated as Editor's Choice (Yes/No) |
Location: case-studies/austo/
Analyze customer data for Austo, a UK-based automobile company looking to expand into the US market by understanding buyer profiles and car purchase behavior.
| Aspect | Details |
|---|---|
| Business Domain | Automobile / Market Research |
| Dataset Size | Customer data with 14 features |
| Key Variables | Age, Gender, Profession, Salary, Total_salary, Price, Make, Personal_loan, etc. |
| Analysis Focus | Customer profiling, purchase behavior, market segmentation, demographic analysis |
Context:
In the 21st century, cars are essential for personal mobility. Research shows more than 76% of people limit their travel when they don't have a car. Austo has successfully established itself in the European market and now aims to understand US customer preferences for three major car types: Hatchback, Sedan, and SUV.
Key Questions Answered:
- What are the demographics of buyers for each car type?
- How do income levels (personal and household) influence car purchase decisions?
- What is the relationship between loan behavior and car pricing?
- How does profession (Salaried vs Business) affect car preferences?
- What customer profiles emerge for Hatchback, Sedan, and SUV buyers?
Data Features:
| Feature | Description |
|---|---|
Age |
Age of the customer |
Gender |
Gender of the customer |
Profession |
Salaried or Business person |
Marital_status |
Marital status (Single/Married) |
Education |
Highest education level (Graduate/Post Graduate) |
No_of_Dependents |
Number of dependents |
Personal_loan |
Whether customer availed a personal loan (Yes/No) |
House_loan |
Whether customer availed a house loan (Yes/No) |
Partner_working |
Whether partner is working (Yes/No) |
Salary |
Annual salary of the customer |
Partner_salary |
Annual salary of partner |
Total_salary |
Annual household income |
Price |
Price of the car purchased |
Make |
Car type - Hatchback, Sedan, or SUV |
| Skill Category | Topics |
|---|---|
| Python Basics | Data types, loops, conditionals, functions, list comprehensions |
| Data Manipulation | Pandas DataFrames, data cleaning, filtering, grouping, aggregation |
| Data Visualization | Matplotlib, Seaborn (histograms, boxplots, scatter plots, heatmaps) |
| Statistical Analysis | Descriptive statistics, correlation, distribution analysis |
| Business Analytics | Deriving insights, identifying patterns, making recommendations |
| Technology | Purpose |
|---|---|
| Python 3.x | Core programming language |
| Pandas | Data manipulation and analysis |
| NumPy | Numerical computations |
| Matplotlib | Basic plotting and visualization |
| Seaborn | Statistical data visualization |
| Jupyter Notebook | Interactive development environment |
python-foundations/
βββ README.md # This file
βββ case-studies/
βββ tips/
β βββ README.md # Tips case study documentation
β βββ Tips_Case_Study.ipynb # Jupyter notebook with analysis
β βββ tips.csv # Dataset
βββ food-hub/
β βββ README.md # FoodHub case study documentation
β βββ foodhub.ipynb # Jupyter notebook with analysis
β βββ foodhub_order.csv # Dataset
βββ honey-production/
β βββ README.md # Honey production case study documentation
β βββ Session_Notebook_Honey_Production_Case_Study.ipynb # Jupyter notebook
β βββ honeyproduction1998-2016.csv # Dataset (1998-2016)
βββ google-play-store/
β βββ README.md # Google Play Store case study documentation
β βββ Google_Play_Store_Case_Study.ipynb # Jupyter notebook with analysis
β βββ google_play_store.csv # Google Play Store dataset
βββ austo/
βββ README.md # Austo case study documentation
βββ austo_project.ipynb # Jupyter notebook with analysis
βββ austo_automobile.csv # Customer and car purchase dataset
Make sure you have Python 3.x installed along with the following libraries:
pip install pandas numpy matplotlib seaborn jupyter-
Clone the repository:
git clone <repository-url> cd python-foundations
-
Launch Jupyter Notebook:
jupyter notebook
-
Navigate to a case study and open the
.ipynbfile -
Run all cells to reproduce the analysis
Here is a curated list of resources to deepen your Python data analysis skills:
- Data Visualization β How to Pick the Right Chart Type?
- Explore more with the functionalities of the Seaborn library
- When Should You Delete Outliers from a Data Set?
- How to Handle Missing Data with Python
- Selecting the appropriate outlier treatment
- Why 1.5 Is Used in the IQR Rule for Outlier Detection
- Why Should We Use NumPy?
- The pandas DataFrame: Make Working With Data Delightful
- Guidelines for working with external data in Google Colab
This project is for educational purposes.
Contributions are welcome! Feel free to:
- Add new case studies
- Improve existing documentation
- Fix bugs or enhance code quality
- Add additional reference materials