💻 Data Engineer | Snowflake | ELT / ETL | AI-Driven Data Solutions
Snowflake Certified - Data Engineering Professional Certificate
Microsoft (Professional Certificate) - Career Essentials in Generative AI
I design and build data pipelines, cloud warehouses, and AI-driven analytics solutions that help businesses move from raw data to reliable decisions.
Over 3 years, I've worked across two ends of the data stack - engineering production-grade ELT pipelines in Snowflake for a real client project (Mercedes-Benz: USA & Canada) at Infosys, and owning the full BI function at Troy Consultancy where I built dashboards used for daily decision-making.
My focus areas:
- ELT pipeline development in Snowflake - multi-layered ingestion through Bronze → Silver → Gold using ADF, Snowpipe, and structured/semi-structured formats (CSV, JSON, Parquet)
- Dimensional data modeling - Star Schema design with SCD Type 2 for historical tracking across large-scale analytical datasets
- CDC & automation using Snowflake Streams and Tasks for incremental load orchestration and change tracking across multi-layered tables
- AI-assisted development using Snowflake Cortex Code for SQL generation, DDL scripting, and pipeline development - cutting manual coding effort significantly
- Natural language querying via Cortex Analyst with a YAML-based semantic model, enabling business users to query 86K+ records without writing SQL
- Semantic layer development - defining measures, dimensions, and relationships in Cortex Analyst's semantic model for governed, business-friendly data access
- Cloud storage & integration - ingesting data from ADLS and AWS S3 into Snowflake across structured and semi-structured formats
- BI & reporting with Power BI, integrating SQL Server, Excel, and web sources into interactive dashboards for HR and management teams
- Query & warehouse optimization - using Snowflake Account Usage views to monitor execution, resource utilization, and reduce compute costs
Built an end-to-end Snowflake data platform integrating AI-assisted development and semantic analytics. Implemented Medallion Architecture (Bronze → Silver → Gold) to process 13+ source files into an analytical Star Schema with SCD Type 2. Leveraged Snowflake Cortex Code for AI-assisted SQL generation, query optimization, and pipeline development, reducing manual effort. Built a semantic model using Cortex Analyst to enable natural language-to-SQL querying on 86K+ sales records, allowing business users to interact with data without writing SQL.
Full enterprise-style data warehouse built using Microsoft SQL Server, implementing a Bronze → Silver → Gold layered architecture with CRM and ERP source integration, stored procedure-based ETL, star schema modeling, and a Sales Data Mart with dim_customers, dim_products, and fact_sales.
Production-style Snowflake pipeline modeled on a food delivery platform, covering initial & delta loads, CDC using Streams, SCD Type 2 dimensions, a star schema fact table at order-item granularity, data governance with Tags & Masking Policies, and full automation via Stored Procedures and Tasks.
Enterprise-scale retail analytics solution for a 5M+ customer ecommerce company spanning 15 countries. Built on Snowflake with ADLS as external stage, ingesting CSV, JSON, and Parquet data. Implements Bronze → Silver → Gold layers, CDC with Streams, data quality pipelines, and Tasks and Gold layer views for sales performance, customer segmentation, and product analytics.
Change Data Capture implementation (INSERT / UPDATE / DELETE) using Snowflake Streams with AWS S3 integration.
End-to-end automated data ingestion pipeline using Snowpipe — setup, configuration, and event-based triggering.
Querying and extracting nested JSON data in Snowflake using VARIANT data type and FLATTEN function.
Real-world SQL data cleaning — handling nulls, duplicates, standardization, and data type corrections.
Advanced SQL analytics on MLB player, team, and school data — window functions, aggregations, and performance insights.
Analyzing restaurant menu and order data to surface popular dishes, pricing trends, and customer spending patterns.
Real-world Airbnb dataset cleaned using Pandas — handling missing values, outliers, type conversions, and column normalization.
Amazon product data cleaned and preprocessed using Pandas — structured for downstream analytics or ML use.
Interactive HR dashboard covering employee headcount, attrition analysis, departmental performance, and workforce KPIs.
Visual analysis of personality survey data with dynamic slicers, trait distributions, and behavioral pattern breakdowns.
- 📧 Email: debashisdash1999@gmail.com
- 💼 LinkedIn: Choudhury Debashis Dash
✨ Always learning, always building — data tells the story, I make it clear.