prathamk11

🚀 Prathamesh Kulkarni

Data Engineer • AI Engineer • MLOps Developer

💡 Designing scalable data platforms, real-time streaming pipelines, and production AI systems

Transforming raw data → intelligent systems → business impact using
Data Engineering • Machine Learning • Cloud Infrastructure

💡 Core Specializations

⚡ Real-Time Data Engineering
Kafka • Spark • Airflow • Distributed Streaming Pipelines

🤖 AI & Generative AI Systems
LLMs • RAG Architectures • NLP • Deep Learning Models

☁️ Cloud & MLOps Infrastructure
AWS • Docker • Kubernetes • MLflow • CI/CD

📊 End-to-End Data Platforms
Data Ingestion → Feature Engineering → ML Pipelines → API Deployment

🎯 Engineering Focus

⚡ Building high-throughput streaming data systems
🧠 Designing production-grade ML pipelines
🤖 Developing LLM-powered AI applications
☁️ Deploying scalable cloud-native AI infrastructure

⚡ Turning Data into Scalable Intelligent Systems

🧠 About Me

Hi, I'm Prathamesh Kulkarni, a Data Engineer and AI Developer based in Pune, India 🇮🇳.

I build production-grade data pipelines, machine learning systems, and AI applications designed to operate at scale.
My work focuses on real-time data processing, distributed systems, and deploying intelligent models into production environments.

🎓 Education

M.Sc Computer Science — Savitribai Phule Pune University (CGPA: 8.5)
B.E Computer Science — Savitribai Phule Pune University (CGPA: 8.6)

📈 Proven Business Impact

	Metric	Result	Where
⚡	Streaming Latency Reduced	45%	Telphatech LLP — Kafka Architecture
🤖	Manual Effort Eliminated	40%	CaryanamIndia — PySpark + Airflow
🎯	Production Model Accuracy	88%+	NullClass — TensorFlow + HuggingFace
💬	Chatbot Intent Accuracy	+32%	Telphatech LLP — Flask + PyTorch
📊	User Interactions Tracked	10K+	Telphatech LLP — Streamlit Dashboards
🌲	Model Training Time Cut	40%	NullClass — PySpark Pipelines

🏗️ System Architecture Expertise

                        ┌─────────────────────────────────────────────────┐
                        │          REAL-TIME AI DATA PLATFORM              │
                        └─────────────────────────────────────────────────┘

   Data Sources          Ingestion           Processing          Serving
  ┌──────────┐         ┌─────────┐         ┌──────────┐        ┌─────────┐
  │ REST APIs│────────▶│  Kafka  │────────▶│  PySpark │───────▶│ FastAPI │
  │ Databases│         │ Streams │         │Streaming │        │  Flask  │
  │  Files   │         └─────────┘         └──────────┘        └─────────┘
  └──────────┘              │                    │                   │
                            ▼                    ▼                   ▼
                       ┌─────────┐         ┌──────────┐        ┌─────────┐
                       │ Airflow │         │  Delta   │        │ Docker  │
                       │  DAGs   │         │   Lake   │        │   K8s   │
                       └─────────┘         └──────────┘        └─────────┘
                            │                    │                   │
                            ▼                    ▼                   ▼
                       ┌─────────┐         ┌──────────┐        ┌─────────┐
                       │   dbt   │         │   ML     │        │   AWS   │
                       │Snowflake│         │  Model   │        │ EC2·S3  │
                       └─────────┘         └──────────┘        └─────────┘
                                                │
                                    ┌───────────┴───────────┐
                                    │      MLflow           │
                                    │  Experiment Tracking  │
                                    └───────────────────────┘

⚡ Full Tech Stack

🧑‍💻 Languages

🤖 AI / ML & GenAI

📊 Big Data & Data Engineering

☁️ Cloud & DevOps

🛠️ Frameworks & Tools

💼 Work Experience

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  🏢  CaryanamIndia                              Oct 2025 – Jan 2026
      Software Development Intern — AI & Data Engineering | Pune
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  ✅  Architected PySpark + Airflow automation pipelines
      → Eliminated 40% manual effort across business operations
  ✅  Built NLP document intelligence pipelines on AWS S3 + Lambda
      → Enabled scalable, low-latency automated workflows
  ✅  Delivered AI-powered Power BI decision-support dashboards
      → Directly improved operational KPIs & cross-team productivity

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  🏢  Telphatech LLP                             Jan 2024 – Jul 2024
      Full-Stack Developer Intern | Pune
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  ✅  Deployed production AI chatbot (Flask + PyTorch)
      → 32% boost in intent-recognition accuracy on live traffic
  ✅  Engineered real-time Kafka streaming pipelines
      → 45% reduction in end-to-end system latency
  ✅  Built Streamlit + Tableau dashboards tracking 10K+ interactions
      → Containerized via Docker for scalable deployment

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  🏢  NullClass                                  Jan 2024 – Jun 2024
      Data Science Intern | Remote
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  ✅  Developed TensorFlow + HuggingFace emotion-detection models
      → 88%+ accuracy on production datasets
  ✅  Refactored PySpark preprocessing pipelines
      → 40% reduction in model training time
  ✅  MLflow experiment tracking → production-ready ML components

🚀 Featured Projects

Project	Stack	Highlights
🌫️ Air Quality Index Prediction	PySpark · Flask · Random Forest · Heroku	End-to-end regression pipeline · Web scraping · 6+ models benchmarked · Best RMSE: 38.85 · Live on Heroku
🌿 Cotton Plant Disease Detection	TensorFlow · VGG-19 · Flask · Docker	Fine-tuned transfer learning · 94.6% accuracy · Dockerized · Real-time inference API
📈 Apple Stock Price Forecasting	Stacked LSTM · Tingo API · MLflow	100-day lookback windows · MLflow tracking · Test RMSE: 239.6
🔍 Fraud Transaction Classification	Scikit-learn · PySpark · Python	Imbalanced data handling · Cross-validation · Random Forest: 94% accuracy

🎓 Certifications

📊 GitHub Analytics

🌐 Let's Connect & Build Something Great

💼 Available for: Data Engineer · AI Engineer · MLOps · Data Analyst roles

📍 Based in: Pune, India | 🌐 Open to: Remote & Hybrid roles globally

🔒 Engineering Credibility

✔️ All projects in this profile follow production-grade practices used in real data platforms.

✔️ Code includes scalable data pipelines, ML workflows, and deployment-ready architectures.

✔️ Built using industry tools such as PySpark, Kafka, Airflow, AWS, Docker, and MLflow.

✔️ Every repository contains complete code, documentation, and reproducible workflows.

💼 Open to Data Engineer · AI Engineer · MLOps opportunities.

⭐ "Data is the new oil — I build the refineries that turn it into intelligence."

Provide feedback

Saved searches

Use saved searches to filter your results more quickly