💡 Designing scalable data platforms, real-time streaming pipelines, and production AI systems
Transforming raw data → intelligent systems → business impact using
Data Engineering • Machine Learning • Cloud Infrastructure
⚡ Real-Time Data Engineering
Kafka • Spark • Airflow • Distributed Streaming Pipelines
🤖 AI & Generative AI Systems
LLMs • RAG Architectures • NLP • Deep Learning Models
☁️ Cloud & MLOps Infrastructure
AWS • Docker • Kubernetes • MLflow • CI/CD
📊 End-to-End Data Platforms
Data Ingestion → Feature Engineering → ML Pipelines → API Deployment
- ⚡ Building high-throughput streaming data systems
- 🧠 Designing production-grade ML pipelines
- 🤖 Developing LLM-powered AI applications
- ☁️ Deploying scalable cloud-native AI infrastructure
Hi, I'm Prathamesh Kulkarni, a Data Engineer and AI Developer based in Pune, India 🇮🇳.
I build production-grade data pipelines, machine learning systems, and AI applications designed to operate at scale.
My work focuses on real-time data processing, distributed systems, and deploying intelligent models into production environments.
- M.Sc Computer Science — Savitribai Phule Pune University (CGPA: 8.5)
- B.E Computer Science — Savitribai Phule Pune University (CGPA: 8.6)
| Metric | Result | Where | |
|---|---|---|---|
| ⚡ | Streaming Latency Reduced | 45% | Telphatech LLP — Kafka Architecture |
| 🤖 | Manual Effort Eliminated | 40% | CaryanamIndia — PySpark + Airflow |
| 🎯 | Production Model Accuracy | 88%+ | NullClass — TensorFlow + HuggingFace |
| 💬 | Chatbot Intent Accuracy | +32% | Telphatech LLP — Flask + PyTorch |
| 📊 | User Interactions Tracked | 10K+ | Telphatech LLP — Streamlit Dashboards |
| 🌲 | Model Training Time Cut | 40% | NullClass — PySpark Pipelines |
┌─────────────────────────────────────────────────┐
│ REAL-TIME AI DATA PLATFORM │
└─────────────────────────────────────────────────┘
Data Sources Ingestion Processing Serving
┌──────────┐ ┌─────────┐ ┌──────────┐ ┌─────────┐
│ REST APIs│────────▶│ Kafka │────────▶│ PySpark │───────▶│ FastAPI │
│ Databases│ │ Streams │ │Streaming │ │ Flask │
│ Files │ └─────────┘ └──────────┘ └─────────┘
└──────────┘ │ │ │
▼ ▼ ▼
┌─────────┐ ┌──────────┐ ┌─────────┐
│ Airflow │ │ Delta │ │ Docker │
│ DAGs │ │ Lake │ │ K8s │
└─────────┘ └──────────┘ └─────────┘
│ │ │
▼ ▼ ▼
┌─────────┐ ┌──────────┐ ┌─────────┐
│ dbt │ │ ML │ │ AWS │
│Snowflake│ │ Model │ │ EC2·S3 │
└─────────┘ └──────────┘ └─────────┘
│
┌───────────┴───────────┐
│ MLflow │
│ Experiment Tracking │
└───────────────────────┘
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🏢 CaryanamIndia Oct 2025 – Jan 2026
Software Development Intern — AI & Data Engineering | Pune
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
✅ Architected PySpark + Airflow automation pipelines
→ Eliminated 40% manual effort across business operations
✅ Built NLP document intelligence pipelines on AWS S3 + Lambda
→ Enabled scalable, low-latency automated workflows
✅ Delivered AI-powered Power BI decision-support dashboards
→ Directly improved operational KPIs & cross-team productivity
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🏢 Telphatech LLP Jan 2024 – Jul 2024
Full-Stack Developer Intern | Pune
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
✅ Deployed production AI chatbot (Flask + PyTorch)
→ 32% boost in intent-recognition accuracy on live traffic
✅ Engineered real-time Kafka streaming pipelines
→ 45% reduction in end-to-end system latency
✅ Built Streamlit + Tableau dashboards tracking 10K+ interactions
→ Containerized via Docker for scalable deployment
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🏢 NullClass Jan 2024 – Jun 2024
Data Science Intern | Remote
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
✅ Developed TensorFlow + HuggingFace emotion-detection models
→ 88%+ accuracy on production datasets
✅ Refactored PySpark preprocessing pipelines
→ 40% reduction in model training time
✅ MLflow experiment tracking → production-ready ML components
| Project | Stack | Highlights |
|---|---|---|
| 🌫️ Air Quality Index Prediction | PySpark · Flask · Random Forest · Heroku | End-to-end regression pipeline · Web scraping · 6+ models benchmarked · Best RMSE: 38.85 · Live on Heroku |
| 🌿 Cotton Plant Disease Detection | TensorFlow · VGG-19 · Flask · Docker | Fine-tuned transfer learning · 94.6% accuracy · Dockerized · Real-time inference API |
| 📈 Apple Stock Price Forecasting | Stacked LSTM · Tingo API · MLflow | 100-day lookback windows · MLflow tracking · Test RMSE: 239.6 |
| 🔍 Fraud Transaction Classification | Scikit-learn · PySpark · Python | Imbalanced data handling · Cross-validation · Random Forest: 94% accuracy |
💼 Available for: Data Engineer · AI Engineer · MLOps · Data Analyst roles
📍 Based in: Pune, India | 🌐 Open to: Remote & Hybrid roles globally
✔️ All projects in this profile follow production-grade practices used in real data platforms.
✔️ Code includes scalable data pipelines, ML workflows, and deployment-ready architectures.
✔️ Built using industry tools such as PySpark, Kafka, Airflow, AWS, Docker, and MLflow.
✔️ Every repository contains complete code, documentation, and reproducible workflows.
💼 Open to Data Engineer · AI Engineer · MLOps opportunities.