Skip to content
View Sam-24-dev's full-sized avatar

Block or report Sam-24-dev

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Sam-24-dev/README.md

👋 About Me

I don't just analyze data. I build the systems that make analysis possible.

Junior Data Engineer & Analyst | Computer Engineering Student (7th Semester), ESPOL — Ecuador

I design and deliver production-style data systems that turn raw data into decision-ready products.
My focus is the full lifecycle: ETL/ELT pipelines, data quality contracts, analytics modeling, and stakeholder-facing BI/ML outputs.

I combine Data Engineering (automation, testing, CI/CD, architecture) with Data Analytics (KPI storytelling, dashboarding, business impact analysis) to create solutions that are both technically robust and useful for decision-making.


💼 Core Value I Bring

  • Production Data Engineering: I build reproducible ETL/ELT pipelines with Pandera validation gates, automated testing, and CI workflows (133+ tests in production-style projects).
  • Database & Analytics Engineering: I design reliable SQL transformation layers (3NF modeling, query tuning, indexing), achieving up to 40% performance improvements in analytical workloads.
  • Business Intelligence & Decision Support: I create KPI-driven dashboards and executive-ready narratives, including identification of $16K+ performance gaps for action planning.
  • Applied ML for Real Products: I develop explainable predictive systems (e.g., dynamic pricing with 1.2M+ records) and deliver artifacts ready for web/product integration.

📌 Business Impact Snapshot

  • Reduced analytical query runtime by up to 40% through SQL optimization and indexing strategy.
  • Identified $16K+ performance/revenue opportunity gaps through BI analysis and KPI storytelling.
  • Deployed multiple reproducible data products with 133+ automated tests and CI-backed validation workflows.

🎯 Target Roles

🛠️ Data Engineer

  • ETL/ELT pipeline design and automation

  • Data contracts and quality gates (Pandera)

  • Analytics engineering with SQL, dbt, DuckDB

  • Workflow reliability, testing, and CI/CD

  • Reproducible data products from raw inputs to validated marts

📊 Data Analyst / BI Analyst

  • KPI modeling and business performance tracking

  • Dashboard design for stakeholders and executive reporting

  • Insight generation and decision-focused storytelling

  • Exploratory data analysis (EDA) and trend interpretation

  • Translating technical outputs into clear business actions


🔭 Current Focus

📘 Certification Track PL-300: Microsoft Power BI Data Analyst — Strengthening advanced modeling, DAX, and business storytelling for decision-focused dashboards.
☁️ Learning Path Cloud + dbt — Building stronger foundations in modern data stack practices, transformation workflows, and analytics engineering standards.
🧩 Career Optimization Portfolio optimization for job applications — refining project narratives, measurable impact, and recruiter-facing positioning for Junior Data Engineer / Data Analyst opportunities.

🚧 Current Project

RideFare -ETL-Pipeline Project Status

RideFare is currently in active development.

This project is being rebuilt from a notebook-centered ETL demo into a complete, production-style data product for portfolio use.

It combines:

  • Data Engineering: reproducible ingestion, validation, transformation, and analytics modeling
  • Machine Learning: documented training, evaluation, explainability, and exportable artifacts
  • Frontend Product: a polished public web app in Spanish with typed data contracts
  • Automation & Delivery: CI workflows, data refresh pipelines, and deployment-ready structure
  • Documentation Quality: architecture docs, model documentation, ADRs, and runbooks

🌎 Spoken Languages

      
Actively preparing for C1 certification

🏆 Certifications & Awards

🎖️ Certification / Award 🏢 Issuer 📅 Status / Date 🔗 Link
📗 Microsoft Office Specialist: Excel Associate (Microsoft 365 Apps) Microsoft Issued: Mar 2026 📄 Credential
📊 Data Analyst Associate DataCamp Issued: Mar 2026 📄 Credential
🛠️ ETL y ELT en Python DataCamp Issued: Mar 2026 📄 Credential
🌍 Galactic Problem Solver — Global Nominee NASA Space Apps Challenge Oct 2025 📄 View
🤖 Desarrollo con IA: de 0 a Producción BIG school Issued: Mar 2026 📜 Credential
📊 Data-Driven Decision Specialist (Bootcamp) ESPOL & MINTEL Completed (Graduation: Apr 2026) ⭐ Top Project

🚀 Featured Projects

End-to-End Data Engineering + ML + Analytics Product

From notebook-based analysis to a reproducible, production-style data product with public delivery.

  • Pipeline Modernization: Rebuilt legacy notebook flow into reproducible commands (ridefare ingest, transform, train, export-web) with clear operational interfaces.
  • Data Quality by Design: Implemented schema and validation controls with Pandera, stable transformations with DuckDB + dbt, and versioned public artifacts.
  • Explainable ML Delivery: Trained and exported XGBoost + SHAP artifacts for transparent model behavior and scenario exploration.
  • Public Product Interface: Delivered a Spanish-language Next.js web experience (/dashboard, /como-funciona, /escenarios) powered by deterministic exported JSON.
  • Automation & Deployment: Integrated CI validation, artifact refresh workflows, preview/prod deploy pipelines, and release automation.
   

End-to-End Multi-Source Data Engineering Platform

Tracking real-time developer technology trends by orchestrating data from GitHub, StackOverflow, and Reddit into a unified analytics engine.

  • 🌐 Multi-Source ETL: Consolidates developer signals from GitHub, StackOverflow, and Reddit into a canonical pipeline.
  • 🛡️ Data Quality Gates: Enforces schema and validation rules with Pandera data contracts.
  • ⚡ Modern Analytics Engine: Uses DuckDB for trend computation, ranking, and lightweight analytical workloads.
  • ✅ Production Discipline: 133+ passing tests with automated CI/CD workflows and scheduled refreshes.
  • 📱 Delivery Layer: Serves insights to a Flutter Web dashboard with stable bridge outputs for frontend consumption.
   

📁 Other Key Projects

Award: Galactic Problem Solver (Global Nominee)

  • Innovation: Built a full-stack web app analyzing 10 years of NASA satellite data across 195+ countries with <2s response time on interactive maps.
  • Impact: Developed MVP in a 48-hour hackathon, integrating real-time APIs to predict global extreme weather probabilities.
  • Tech: Python (Flask), React, TypeScript, Leaflet, Plotly.
 

End-to-end Data Engineering for Agriculture

  • Result: Engineered a Python ETL pipeline (covered by 14 unit tests) that modeled a strategic turnaround, projecting an ROI improvement from -5.58% to +15% (+20.6 pts) and a +75% boost in productivity.
  • Architecture: Built a robust MySQL -> Python -> JSON pipeline feeding a 5-page interactive dashboard for operational tracking.
  • Tech: MySQL, Python, Pandas, Pytest, JS/Bootstrap.
 

Business Intelligence

  • Insight: Analyzed sales distribution across 23 active sellers ($28.4K avg), uncovering a critical $16.66K performance gap between top and bottom performers.
  • Impact: Identified "Meat" as the top revenue driver ($80.05K) and Tulsa as the premier market (20 top clients), delivering actionable KPIs for data-driven decisions.
  • Tech: Power BI, DAX, Excel.

Scientific Research & Data Modeling

  • Validation: Built an automated R pipeline to validate a Negative Binomial Distribution model (k=3, p=0.3) on 309 observations, achieving a statistically significant p-value of 0.660.
  • Impact: Tracked a mean serve time of 1.945s (<2s threshold) and exported JSON/PNG assets into a dynamic JS web dashboard.
  • Tech: R (Tidyverse, ggplot2), HTML/CSS/JS.
 

🛠️ Technical Stack

Category Technologies
💻 Languages Python R SQL TypeScript Dart
⚙️ Data Engineering & DBs DuckDB MySQL SQLite Pandas Jupyter
🤖 Machine Learning Scikit-Learn
🧪 Testing & Quality Pytest Pandera
📊 Visualization & BI Power BI Tableau Plotly Excel
🌐 Web & Mobile React Flutter Flask Tailwind CSS Vite Bootstrap Leaflet
🚀 DevOps & Cloud GitHub Actions Vercel Git
📚 Learning AWS dbt

📊 GitHub Stats


⏱️ Weekly Coding Activity

Real-time stats powered by WakaTime — tracking every line of code I write.


WakaTime Stats

📈 Contribution Trend

---

🐍 Contribution Snake

github contribution grid snake animation

🤝 Let’s Connect

Open to Junior Data Engineer / Data Analyst roles (remote/hybrid, LATAM/US).

I’m ready to contribute from day one in data pipeline automation, analytics engineering, and decision-focused BI.

Profile Views

Pinned Loading

  1. Technology-trend-analysis-platform Technology-trend-analysis-platform Public

    Data intelligence platform for technology trends across GitHub, StackOverflow, and Reddit using Python ETL, Pandera quality gates, DuckDB trend engine, and Flutter Web.

    Dart

  2. Analisis-Ping-Pong Analisis-Ping-Pong Public

    Automated statistical analysis pipeline using R to model ping pong serve precision with Negative Binomial distribution (309 observations). Includes interactive web dashboard.

    HTML 1

  3. Analisis-Cultivo-Arroz Analisis-Cultivo-Arroz Public

    End-to-end data engineering platform for agricultural analytics. ETL pipeline (Python) + Interactive dashboard (Chart.js) with KPIs, financial analysis, and strategic insights.

    HTML

  4. easyparker-pwa easyparker-pwa Public

    EasyParker es una PWA para reservar parqueo en Guayaquil | Modos: Conductor y Anfitrión | Chat tiempo real | Eventos con surge pricing | Calificaciones etc| React + TypeScript + Tailwind

    TypeScript

  5. eSports-Analytics-Dashboard eSports-Analytics-Dashboard Public

    Dashboard analítico end-to-end para eSports LATAM con ETL en Python, validación de datos, visualización web y proyección ML 2026.

    Python

  6. RideFare-ETL-Pipeline RideFare-ETL-Pipeline Public

    Portfolio-grade pricing intelligence product for urban mobility, built with DuckDB, dbt, XGBoost, Next.js, and Vercel.

    Jupyter Notebook