Skip to content
View devYRPauli's full-sized avatar

Block or report devYRPauli

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
devYRPauli/README.md

Typing SVG

Website LinkedIn X Email Resume


whoami

class YashRajPandey:
    role        = "AI Agents Architect @ UF IFAS"
    based_in    = "Gainesville, Florida"
    focus       = ["local-first LLMs", "RAG", "vector search", "AI agents"]
    also_builds = "full-stack production systems"
    philosophy  = "Understand the real problem. Ship the simplest thing that works. Measure. Iterate."

I build AI infrastructure and production software. I'm the AI Agents Architect at the University of Florida's Institute of Food and Agricultural Sciences, where I lead a function I proposed myself: self-hosted, air-gapped AI systems built on open-weight models.

I joined UF as a Software Engineer in March 2025, was promoted to Lead Software Engineer in October 2025, and moved into the AI Agents Architect role in April 2026.

What I'm building

TurboQuant on Apple Silicon

Independent evaluation of TurboQuant (arXiv 2504.19874), a near-optimal LLM quantization method, ported to run CPU-only on Apple Silicon. Fixed five blocking bugs, then benchmarked long-context retrieval across an MLX path and a llama.cpp Metal path.

Needle-in-a-haystack retrieval: 0% to 100% at 16K tokens, with a large KV cache memory reduction, all on a consumer M1 Pro with no dedicated GPU.

Python MLX llama.cpp Quantization

View the repo →

Blue Omics  ·  full-stack research data platform

A Django, React, and PostgreSQL platform I designed and built from scratch for a research lab. It grew from zero to 5M+ live records across 32 data models and 58 API endpoints, replaced years of manual spreadsheet workflows, and became the primary system for every lab submission.

  • Tuned PostgreSQL with explicit indexing and caching to hold low-millisecond median latency under concurrent access by 30+ researchers
  • Built 7 ingestion pipelines for heterogeneous formats: PDF, Excel, CSV, Word, PowerPoint
  • Automated R Markdown reporting, cutting a 2-3 hour manual process to 15-20 minutes
  • Deployed on GCP with Kubernetes and Terraform; optimized the frontend from 8s to 3s load time

Django React PostgreSQL GCP Kubernetes

ApplyScore  ·  AI resume gap-analysis Chrome extension

A published Chrome extension that scores how well a resume matches any job posting on the web, with evidence-linked gaps and no hallucinated fluff.

  • Universal scraper that pierces Shadow DOM to read postings across LinkedIn, Greenhouse, Ashby, Lever, Workday and more
  • Strict, evidence-based analysis: a confidence-weighted fit score with requirement-by-requirement matches linked to the exact resume bullets that prove them
  • Privacy-first and bring-your-own-key (OpenAI, Anthropic, or Google), so data and model choice stay with the user

JavaScript Chrome LLM

View on the Chrome Web Store →

Stack

AI / ML Open-weight LLMs · RAG · Vector search (Qdrant) · Embeddings · Reranking · Model evaluation · vLLM · Ollama
Backend Python · Django · Django REST · FastAPI · Node.js · PostgreSQL · Redis
Frontend React · TypeScript · Modern HTML/CSS
Infra Docker · Kubernetes · Terraform · GCP · AWS · CI/CD · GitHub Actions

Recognition & Education

Herbert Wertheim College of Engineering Achievement Award, University of Florida. Merit scholarship awarded for top academic standing.

  • M.S. Computer & Information Science & Engineering, University of Florida — GPA 3.8 / 4.0
  • Semester Exchange, University of Florida — GPA 3.7 / 4.0
  • B.Tech Computer Science & Engineering, Jaypee University of Engineering & Technology — GPA 9.1 / 10.0

Writing

I keep playbooks on what I learn shipping local AI, over on yashrajpandey.com:

  • Self-hosting open-weight LLMs without sending data to a cloud API
  • RAG that holds up in production: retrieval, reranking, and the evals that keep it honest
  • Evaluation-gated releases for LLM systems

Off the clock: football (I built Football Hub because I love the game), tactical FPS, story-rich RPGs, and lo-fi for flow state.

Open to good conversations on AI infrastructure and local-first LLM systems.

Pinned Loading

  1. turboquant-m1pro-evaluation turboquant-m1pro-evaluation Public

    TurboQuant KV cache compression evaluation on Apple M1 Pro 16GB. Two-round study: MLX path (100% needle at 16K) and llama.cpp Metal path. Five implementation bugs found and fixed.

    Python 1

  2. AlgoRapids-Path-Optimization-Solutions AlgoRapids-Path-Optimization-Solutions Public

    Python

  3. football-hub football-hub Public

    JavaScript

  4. NLP-JoyRide NLP-JoyRide Public

    Project Repo for CAI6307

    Jupyter Notebook 1

  5. portfolio portfolio Public

    TypeScript