Skip to content
View AtharvaW29's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report AtharvaW29

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
AtharvaW29/README.md

Hi, I'm Atharva Wagh 👋

Software Engineer · Data Engineer · Distributed Systems Enthusiast

MS Computer Science @ University of Southern California (GPA: 3.57)
2+ years building scalable pipelines, cloud-native backends, and ML-powered systems

Portfolio LinkedIn Email Resume


🧭 About Me

I'm a Computer Science graduate student at USC with a background in building high-throughput data pipelines, distributed backend systems, and AI-integrated applications. I've processed 100M+ records, reduced system latency by 80%+, and shipped containerized microservices in production environments.

I'm passionate about the intersection of systems engineering, data at scale, and applied ML — and I love building things that are both technically rigorous and practically useful.

  • 🔬 Currently: Research Assistant in Bioinformatics @ USC — building ETL pipelines for RNA-seq data
  • 🏎️ Latest project: F1 Podium Prediction Model with 82.35% true positive rate & 0.98 AUC
  • 📖 Coursework: Distributed Systems, High-Performance Computing, Deep Learning, Advanced CV
  • 🌏 Previously: Full-stack & automation engineering @ IMFS, Mumbai

🛠️ Tech Stack

Languages

Python Java JavaScript TypeScript C# C++ SQL R GraphQL Solidity

Data Engineering & ML

Apache Spark Apache Kafka Apache Hadoop Pandas NumPy Scikit-learn PyTorch TensorFlow

Cloud & DevOps

AWS GCP Azure Docker Kubernetes GitHub Actions

Databases

PostgreSQL MongoDB MySQL Cassandra Redis Snowflake Firebase

Frameworks & Tools

FastAPI Next.js React Node.js Django Spring Boot .NET LangChain

Visualization

Power BI Tableau Matplotlib Seaborn


📊 GitHub Analytics

Atharva's GitHub Stats

Top Languages

GitHub Streak


🚀 Featured Projects

🤖 LLM-Powered Interview Officer

LangChain · RAG · Apache Kafka · WebSockets · Docker · Next.js · UnSloth

A distributed, production-grade AI interview system built on event-driven architecture. Leveraged Retrieval-Augmented Generation (RAG) for contextual, intelligent responses and Kafka for async messaging across containerized microservices. Designed with real-world scalability in mind — fault-tolerant, low-latency, and deployable at scale.

Repo


🧬 TCR Sequence Extraction from RNA-seq Data

Python · R · Apache Spark · Parallel Processing · Bioinformatics Pipeline

End-to-end bioinformatics pipeline processing 100M+ RNA-seq records with parallelization and chunking strategies. Achieved 76% sequence alignment accuracy with cross-tool benchmarking and profiling. Automated data download (ENA), caching layers, and reproducible analysis scripts for clinical interpretation.

Repo


🏎️ Formula 1 Podium Prediction Model

Python · SQL · AWS SageMaker · Scikit-learn · Power BI · Statsmodel API

Pre- and post-qualifying race outcome predictor using Logistic Regression on historical F1 datasets. Achieved 82.35% true positive rate and 0.98 AUC. Built reusable data pipelines, structured data models, and an interactive Power BI dashboard for race analysis.

Repo


🎓 Predictive Analytics for College Admissions

Python · XGBoost · Random Forest · AWS SageMaker · FastAPI · Node.js

Ensemble ML pipeline (Random Forest + XGBoost) for multi-class admissions classification. Achieved ROC AUC of 0.81 (train) / 0.78 (test). Deployed via FastAPI with REST endpoints for real-time inference and cloud-based model serving on AWS SageMaker.

Repo


⚡ Blockchain-Based Renewable Energy Trading Platform

Solidity · C++ · Truffle · Remix · Embedded Systems (Cortex-A53) · Distributed Systems

Peer-reviewed research turned into a working prototype for P2P renewable energy trading using smart contracts and decentralized architectures. Designed real-time IoT sensor integration and developed optimization algorithms for distributed energy system orchestration. Published research: "Bridging Energy Gaps: Blockchain-Enabled P2P Trading for Renewable Energy" (2024).

Repo


💼 Experience Highlights

Role Organization Period Key Impact
Bioinformatics Research Assistant USC Sep 2025 – Mar 2026 Processed 100M+ RNA-seq records; built scalable ETL pipelines
Software Developer (Full-Time) IMFS, Mumbai Jul 2024 – May 2025 Reduced system latency by 83% via Docker/K8s containerization
Web Developer Intern Mabella SkinCare Jun 2023 – May 2024 Built CRM handling 1000+ req/hr; developed OCR recommendation feature

🎓 Education

M.S. Computer Science — University of Southern California (Aug 2025 – May 2027)
GPA: 3.57 | Algorithms · HPC · Distributed Systems · Deep Learning · Advanced CV

B.Tech Information Technology (Blockchain Honors) — University of Mumbai (Aug 2020 – May 2024)
GPA: 4.0 (Magna Cum Laude Equivalent) | DSA · OS · Cloud Computing · IoT · Blockchain


📄 Research

Bridging Energy Gaps: Blockchain-Enabled P2P Trading for Renewable Energy (2024)
Peer-reviewed research on decentralized energy trading with distributed smart contracts and IoT sensor integration. Developed optimization algorithms for real-time energy distribution system architectures.


📈 Activity Graph

Atharva's Activity Graph


🌐 Let's Connect

Portfolio LinkedIn Email


"Build systems that scale, write code that lasts."

Profile Views

Pinned Loading

  1. F1_Analysis F1_Analysis Public

    This repository applies Machine Learning on Historical and Live F-1 Data to Grand Prix Podium Pre & Post Quali

    Jupyter Notebook 1

  2. AI_Interview_Prototype AI_Interview_Prototype Public

    This is the prototype for a LLM based Interview Agent

    Python 1

  3. TCR-Extraction-From-RNA-Sequences TCR-Extraction-From-RNA-Sequences Public

    This project explores the computationally efficient ways of extracting tcr data from a given set of fastq RNA sequence files, and also delineates a comparison between the popular tools used for ext…

    Jupyter Notebook

  4. WattSwap WattSwap Public

    A peer-to-peer Renewable Energy Trading Platform

    JavaScript 1

  5. BlockchainVlab BlockchainVlab Public

    A virtual simulation to learn core elements of Blockchain

    JavaScript 2

  6. Group66_Project Group66_Project Public

    This is the repository for the RNA Sequence Alignment Problem Assignment

    Python