Skip to content

rishabhg8/analytics_engineering_project

Repository files navigation

Create a Customer Insights Pipeline with DBT and Airflow

🎯 Our Goal: Building a Production-Ready Data Pipeline

Welcome! In this notebook, we'll build a complete, production-ready ELT pipeline from scratch. Here’s a brief overview of our project:

  • The Dataset: We'll use the "Jaffle Shop," a fictional e-commerce store. Our raw data is split across three CSV files: raw_customers, raw_orders, and raw_payments. These tables are logically linked by shared id columns, which we'll use to join them, as shown in the schema diagram below.

Jaffle Shop Schema

  • The Tasks: We will build an end-to-end pipeline. This includes Loading the data (using dbt seed), Transforming it with a 3-layer dbt model (staging $\to$ intermediate $\to$ marts), Testing our models for data quality (like uniqueness and relationships), and finally, Orchestrating the entire process into an automated, scheduled job with Airflow.

  • The Audience: This pipeline is for any business that wants to answer the critical question, "Who are my most valuable customers?" Our final product will be a clean, reliable, and analytics-ready table (dim_customers) that a BI tool (like Tableau or Power BI) can connect to for analysis.

Follow the notebook

🏁 Project Complete!

We have successfully built and orchestrated a full data pipeline. The pipeline successfully created the final analytical table, and the output directly answers the core business question: "Who are our most valuable customers?".

What We Did:

  • Built Models (dbt): We used dbt to load seed data, run transformations (staging $\rightarrow$ intermediate $\rightarrow$ marts), and test our data quality.
  • Orchestrated Pipeline (Airflow): We wrote an Airflow DAG and used the airflow command to automatically run our entire dbt pipeline (seed, run, and test) in the correct, automated sequence.
  • The Answer: The final output is the single, reliable dim_customers table, which a BI tool (like Tableau or Power BI) could connect to for analysis.

This is the core workflow of a modern data pipeline!

About

analytics engineering project built in github codespaces

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Generated from github/codespaces-jupyter