GitHub - P-Slark/LLM-Step-by-Step: Building a small LLM in PyTorch from scratch - a step-by-step journey through Stanford CS336.

What worked, what didn't, and why — a small LLM, end-to-end

| Journey | CS336 (curriculum) | Issues |

👉 Two things to read

📖 JOURNEY.md

The writing. A chapter-by-chapter narrative of what I tried, what was slow, and what each round of optimization actually bought me. Each chapter maps to a commit. Most "build an LLM" tutorials show you the final code — this one shows the path.

If you only read one thing in this repo, read this.

💻 `cs336_basics/`

The code. Clean, dependency-light implementations of every piece: BPE tokenizer, RMSNorm/RoPE/SwiGLU/MHA, AdamW, cosine LR schedule, gradient clipping, training loop, sampling.

Read alongside JOURNEY.md. Each module is short and meant to be read top-to-bottom.

Everything else in the repo (scripts/, tests/, fixtures, the assignment PDF) is scaffolding around those two things.

The curriculum is borrowed from Stanford's CS336 ("Language Models from Scratch").

About

🧱 Built from PyTorch primitives, not framework abstractions. No torch.nn.Transformer, no HuggingFace, no transformers library, no pre-trained weights. Every layer, optimizer, and training step is written from scratch in this repo — autograd and tensor ops are the only things borrowed.
📖 A step-by-step journey, not just code. JOURNEY.md walks through what was tried, what was slow, and what each round of optimization actually bought — chapter by chapter, commit by commit. Most "build an LLM" repos hand you the final code. This one shows the path.

What's in here

Part	Topic	Code	Journey chapter
I	BPE tokenizer (train + encode/decode, parallel + incremental)	`cs336_basics/bpe.py`, `tokenizer.py`	Iterations 1–4
II	Transformer model (RMSNorm, RoPE, SwiGLU, MHA, tied embeddings)	`cs336_basics/model.py`	Part II
III	Training building blocks (cross-entropy, AdamW, cosine LR, grad clip)	`cs336_basics/optim.py`, `training.py`, `nn_utils.py`	Part III
IV	Training loop (data loader, checkpointing, `scripts/train.py`)	`cs336_basics/data.py`, `scripts/`	Part IV
V	Text generation (temperature, top-k, top-p, EOS stopping)	`cs336_basics/decoding.py`, `scripts/generate.py`	Part V

Quickstart

# Install (uses uv for env management)
pip install uv
uv sync

# Run the unit tests
uv run pytest

# Download TinyStories
mkdir -p data && cd data
wget https://huggingface.co/datasets/roneneldan/TinyStories/resolve/main/TinyStoriesV2-GPT4-train.txt
wget https://huggingface.co/datasets/roneneldan/TinyStories/resolve/main/TinyStoriesV2-GPT4-valid.txt
cd ..

# Train a BPE tokenizer, encode the corpus, train a small model, generate text
# (see JOURNEY.md for the full pipeline)
uv run scripts/encode_corpus.py --help
uv run scripts/train.py --help
uv run scripts/generate.py --help

Credits & honest scope

The curriculum, the spec, and the test fixtures all come from Stanford's CS336 ("Language Models from Scratch"), which generously publishes its materials online. All the implementations and the writing in JOURNEY.md are mine. If you're a current CS336 student, please respect your course's collaboration policy — this repo is a learning log, not a copy-paste solution.

The original assignment scaffolding (handout PDF, test adapters, submission script) lives under assignment/.

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
artifacts/tinystories		artifacts/tinystories
cs336_basics		cs336_basics
docs		docs
scripts		scripts
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
JOURNEY.md		JOURNEY.md
LICENSE		LICENSE
README.md		README.md
cs336_assignment1_basics.pdf		cs336_assignment1_basics.pdf
make_submission.sh		make_submission.sh
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What worked, what didn't, and why — a small LLM, end-to-end

👉 Two things to read

📖 JOURNEY.md

💻 `cs336_basics/`

About

What's in here

Quickstart

Credits & honest scope

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

What worked, what didn't, and why — a small LLM, end-to-end

👉 Two things to read

📖 JOURNEY.md

💻 cs336_basics/

About

What's in here

Quickstart

Credits & honest scope

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

💻 `cs336_basics/`

Packages