Senior Data Engineer · AI Data Engineer
9+ years building production data platforms — lakehouse, governance, and applied GenAI/RAG.
Senior / AI Data Engineer with 9+ years designing and shipping end-to-end data solutions — from stakeholder requirements to production lakehouse platforms. I work where most teams are short on talent: robust data engineering combined with applied GenAI / RAG in production (embeddings, retrieval, LLM-backed pipelines). I'm data/LLM-infrastructure, not ML research.
Currently consulting as Senior / AI Data Engineer, fully remote.
Languages & Processing — Python · PySpark · SQL
Lakehouse & Warehouse — Databricks · Unity Catalog · Snowflake · Delta Lake
Cloud — Azure (Synapse, Data Factory, ADLS) · GCP
Orchestration & Modeling — Apache Airflow 3 · dbt
Data + AI — RAG pipelines · Embeddings (sentence-transformers) · Vector DB (Qdrant) · LLM serving
Engineering & Governance — FastAPI · Docker · pytest · PostgreSQL/PostGIS · Data Quality · GDPR/LGPD
- 🚀 Scalable data platform sustaining 200% data growth
- 💸 −30% storage cost · −50% pipeline latency
- ⚡ −40% processing time migrating Pandas → PySpark
- ✅ +25% data accuracy · −40% integration errors via governance (GDPR/LGPD)
fastapi-trips — Mobility-data engineering pipeline: FastAPI ingestion (sync + async), PostgreSQL + PostGIS for spatial-temporal analysis, real-time status via WebSocket, Dockerized, scalable design (GIST indexing, AWS architecture sketch). A compact end-to-end example of how I build production data services.
Most of my work lives behind enterprise NDAs (Intel, Vale, BEES/AB InBev, Cyrela, Paschoalotto). For the full picture — projects, impact, and references — see my LinkedIn and reach out directly.
- LinkedIn: in/osilvahudson
- Open to remote Senior / AI Data Engineer roles · LATAM-friendly
