Skip to content
View data-engineer-yogesh's full-sized avatar

Block or report data-engineer-yogesh

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Hi πŸ‘‹ I'm Yogesh

πŸš€ Databricks Data Engineer passionate about building scalable data pipelines using Spark, Delta Lake, and modern Lakehouse architecture.


πŸ‘¨β€πŸ’» About Me

  • πŸ”­ I'm currently looking for new opportunities in data engineering field
  • 🌱 I’m currently learning more and more about Apache frameworks
  • πŸ˜„ Feel free to take a look at my projects!
  • πŸ’» 5+ years of experience in software development
  • πŸ”₯ Building real-world data pipelines using Databricks & Spark
  • πŸ“Š Focused on batch + streaming data processing

πŸ›  Tech Stack

Data Engineering

  • Apache Spark
  • Spark SQL
  • Delta Lake
  • Databricks

Languages

  • Python
  • SQL

Data Architecture

  • Medallion Architecture
  • Data Lakehouse
  • ETL / ELT Pipelines
  • Streaming Data Pipelines

I design and build scalable data pipelines using Apache Spark, Delta Lake, and the Databricks Lakehouse Platform.

My focus areas:

  • Batch ETL pipelines
  • Real-time streaming pipelines
  • Lakehouse architecture
  • Data modeling for analytics

πŸš€ Featured Projects

1. Real-Time Wikipedia Streaming Pipeline

Real-time streaming data pipeline using Spark Structured Streaming.
πŸ”— View Project

2. Brazilian E-commerce Data Lake ETL Pipeline

End-to-end ETL pipeline using Medallion Architecture.
πŸ”— View Project

3. Clinical Trials Analytics

Healthcare analytics pipeline built with Spark SQL and Delta Lake.
πŸ”— View Project

4. Soil Health Nutrient Monitoring Lakehouse

Delta Lakehouse architecture project for agricultural analytics.
πŸ”— View Project

πŸ“« Connect With Me

Pinned Loading

  1. ongoing-clinical-trials-analytics ongoing-clinical-trials-analytics Public

    End-to-end data pipeline for ongoing clinical trials using Databricks. Ingests data from ClinicalTrials.gov API into Delta Lake (Bronze β†’ Silver β†’ Gold) and prepares analytics-ready datasets for sp…

    Python

  2. SoilHealthNutrientMonitoringDeltaLakehouseLab SoilHealthNutrientMonitoringDeltaLakehouseLab Public

    This project is a hands-on Delta Lake learning lab designed to master Delta Lake internals, performance optimization, and governance using a public-sector soil health & agriculture analytics use case.

    Jupyter Notebook

  3. real-time-wikipedia-streaming-pipeline real-time-wikipedia-streaming-pipeline Public

    A real-time data engineering project that ingests live Wikipedia edit events and processes them using the Databricks Lakehouse Medallion Architecture

    Jupyter Notebook

  4. brazilian_ecommerce_data_lake_etl_pipeline brazilian_ecommerce_data_lake_etl_pipeline Public

    End to end Data Engineering pipeline using Databricks, Spark, and Delta Lake on Brazilian ecommerce data.