Skip to content
View VSJ001's full-sized avatar

Block or report VSJ001

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
VSJ001/README.md

Vaishnavi Srikanth Joshi

MS Electrical and Computer Engineering at Northeastern University, Boston (May 2027). Concentration in Computer Systems and Software.

I work at the boundary between hardware and software: pipeline microarchitecture, cache-aware kernel optimization, GPU acceleration, distributed systems concurrency, and embedded firmware. Most of my projects either model real hardware to make it observable, or push real hardware to its memory-bandwidth ceiling.

Currently seeking Fall 2026 co-op roles in embedded firmware, computer architecture, or hardware performance engineering.

Featured projects

Project Stack Headline
Cache-Aware and GPU-Accelerated SpMV CUDA, C, AVX2 Hand-written warp-per-row kernel reaches 98-110% of A100 HBM2 peak, beating cuSPARSE by 24-56% on SuiteSparse matrices
NFS-Like Distributed File Server Rust, UDP RPC Lease-based write locking with FIFO/RR/MLFQ/CFS lock-queue policies, fairness-vs-latency analysis
5-Stage Pipeline Microarchitecture Simulator C IF/ID/EX/MEM/WB simulator with load-use stall detection, EX/MEM and MEM/WB forwarding, branch flush, direct-mapped icache/dcache
FairShare-WiFi Python, SimPy Contextual UCB schedulers (PlainUCB, LinUCB, DLinUCB, DKernelUCB) beat FIFO by 16.5% on Jain's Fairness Index
FPGA Middleware for Power System Simulation Python, Vitis HLS, Vivado Simulink-to-RTL pipeline on Zynq-7000 achieving 5.5-6x speedup over Simulink with 1e-11 to 1e-12 numerical error
Mandelbrot: Serial / OpenMP / Hybrid MPI+OpenMP C, OpenMP, MPI 27-config benchmark sweep, hybrid 4-rank x 2-thread reaches ~4.8x speedup over serial
Photoplethysmography Heart Rate Monitor Arduino, ATmega328P, MAX30105 Live PPG waveform on ST7735 TFT via I2C/SPI with finger-presence buzzer alerts

Tech I work with

Area Tools
Languages C, C++, Rust, Python, MATLAB
GPU / HPC CUDA, cuSPARSE, OpenMP, MPI, AVX2, Linux perf, nsight-compute
Hardware / FPGA Vitis HLS, Vivado, Verilog, ATmega328P, Zynq-7000
Embedded I2C, SPI, UART, ADC, Arduino, MAX30105, ST7735
Systems Linux, SLURM, UDP, RPC, distributed locking, scheduling
Build / DevOps Makefile, Cargo, Git, Docker, FastAPI, Redis

Background

Where What When
Northeastern University MS ECE (Computer Systems and Software), GPA 3.88 2025-2027
Foxconn Hon Hai Technology India Graduate Trainee Engineer (iPhone manufacturing launch) Nov 2023 - Oct 2024
Visvesvaraya Technological University BE Electronics and Communication Engineering, GPA 3.11 2019-2023

Connect

Pinned Loading

  1. 5-Stage-Pipeline-Microarchitecture-Simulator 5-Stage-Pipeline-Microarchitecture-Simulator Public

    5-stage in-order pipeline simulator in C: IF/ID/EX/MEM/WB stages, custom 11-opcode ISA, load-use stall detection, EX/MEM and MEM/WB data forwarding, 2-cycle branch/jump flush. Direct-mapped icache …

    C

  2. Automated-FPGA-Middleware-for-Power-System-Simulation Automated-FPGA-Middleware-for-Power-System-Simulation Public

    Automated Simulink-to-FPGA pipeline in Python: generates HLS-ready C from a discrete-time power system model (Emergency Diesel Generator), runs Vitis HLS synthesis on Zynq-7000 (xc7z020clg400-1) fo…

    Python

  3. Cache-Aware-and-GPU-Accelerated-Sparse-Matrix-Vector-Multiplication Cache-Aware-and-GPU-Accelerated-Sparse-Matrix-Vector-Multiplication Public

    CUDA SpMV kernels (scalar, warp-per-row, ELL) on NVIDIA A100 benchmarked against cuSPARSE on SuiteSparse matrices, plus AVX2 + cache-tiled CPU baselines on Intel Xeon Gold. Vector kernel reaches 98…

    Cuda

  4. Distributed-NFS-File-Server-with-Scheduler-Based-Locking Distributed-NFS-File-Server-with-Scheduler-Based-Locking Public

    Concurrent NFS-like distributed file server in Rust (UDP RPC, Arc<Mutex<>> threading, lease-based write locks) that applies classic OS scheduling algorithms (FIFO, RR, MLFQ, CFS) to write-lock arbi…

    Rust

  5. Fairshare-WiFi Fairshare-WiFi Public

    SimPy-based discrete-event WiFi simulator comparing classic packet schedulers (FIFO, Priority, WFQ) against four contextual multi-armed bandit schedulers (PlainUCB, LinUCB, DLinUCB, DKernelUCB) ove…

    Python

  6. Photoplethysmography-Heart-Rate-Monitor Photoplethysmography-Heart-Rate-Monitor Public

    Arduino Nano photoplethysmography heart rate monitor. Captures raw IR samples from a MAX30105 optical sensor over I2C at 400 kHz, runs real-time beat detection, and renders live BPM and scrolling P…

    C++