Turn the GA4 public sample dataset into business insights, Power BI dashboards, a trained ML model, and a FastAPI prediction service running in Docker.
1. Channel Performance & Conversion

2. E-Commerce Performance Trends

Use GA4 e-commerce events to improve funnel conversion and channel ROI.
Key questions
- Which channels convert best from sessions → purchases?
- How do sessions, revenue, and conversion change by day/week/month?
- Where are the biggest drop-offs in the funnel?
- (ML) Can we predict purchase propensity from simple session features?
Source (BigQuery public dataset)
bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*
Star schema
- dim_date (PK =
date_pk) - dim_channel (PK =
channel_key) - fact_funnel_by_date_channel
- FK:
date_pk→dim_date.date_pk - FK:
channel_key→dim_channel.channel_key - Metrics:
sessions, add_to_cart, purchases, revenue, conversion_rate, aov
- FK:
SQL models (/sql)
10_dim_date.sql20_dim_channel.sql30_fact_funnel_by_date_channel.sql
Visual ERD
See images/erd.png (rendered above).
Flow: GA4 → BigQuery → SQL models → Python notebook (exports + model) → Power BI → FastAPI (Docker)
- BigQuery builds the star schema.
- Notebook creates
exports/*.csvfor BI and trains a purchase propensity model. - Power BI reads exports for dashboards.
- FastAPI serves a
/predictendpoint using the trained model. - Docker packages the API for consistent local runs.
Diagram: images/architecture.png (rendered above).
Notebook: notebooks/01_eda_kpis.ipynb
Covers
- Channel performance (sessions, purchases, revenue, conversion rate)
- Time trends (weekly/monthly)
- Funnel analysis (Sessions → Add to Cart → Purchases)
- Trains a RandomForestClassifier for purchase propensity
Exports produced (used by BI)
exports/channel_summary.csvexports/time_summary.csvexports/funnel_summary.csv
- PBIX (download):
dashboard/powerbi/GA4 Dashboard.pbix - Slides (overview): Google Slides
Model: RandomForestClassifier for purchase propensity.
Example features
- Numeric:
sessions,add_to_cart - Categorical one-hots:
channel_group,day_name,month
Artifacts (/models)
purchase_rf.joblibexpected_cols.json← columns used at inference
Code: src/api.py
python -m venv .venv
# Windows (PowerShell)
.\.venv\Scripts\Activate.ps1
# macOS/Linux
source .venv/bin/activate
pip install -r requirements.txt
uvicorn src.api:app --host 127.0.0.1 --port 8000 --reloadHealth check
curl http://127.0.0.1:8000/health
curl -X POST http://127.0.0.1:8000/predict \
-H "Content-Type: application/json" \
-d '{"sessions":3,"add_to_cart":1,"channel_group":"Referral","day_name":"Saturday","month":12}'docker build -t ga4-api .
docker run -p 8000:8000 ga4-api
curl http://127.0.0.1:8000/healthEndpoints
GET /health→ service statusPOST /predict→ { "prediction": 0|1, "probability_purchase": float }
- Identified top-performing channels by conversion rate and revenue.
- Highlighted the largest funnel drop-offs (Sessions → Add to Cart, Cart → Purchase).
- Machine Learning flagged high-propensity sessions for remarketing and CRO (conversion rate optimization).
- Delivered a full end-to-end pipeline (SQL → Python → BI → ML → API → Docker) showing both technical depth and business value.
Clone & setup
git clone https://github.com/Egbe34/ga4-ecommerce-analytics.git
cd ga4-ecommerce-analytics
python -m venv .venv
# Windows (PowerShell)
.\.venv\Scripts\Activate.ps1
# macOS/Linux
source .venv/bin/activate
pip install -r requirements.txt
Run notebook → generate exports & models
Open notebooks/01_eda_kpis.ipynb
Run all cells to create exports/* and models/*
Start APIga4-ecommerce-analytics/ ├─ sql/
├─ notebooks/
├─ exports/
├─ models/
├─ src/
├─ dashboard/
│ └─ powerbi/
├─ docs/
├─ images/
├─ Dockerfile
└─ README.md
**License:** MIT


