Transformer Encoder from Scratch (Paper2Code)

This project implements the Transformer Encoder from the paper "Attention is All You Need" (Vaswani et al., 2017), from scratch using NumPy only — no deep learning frameworks.

It closely follows the paper structure:

Project Structure

attention.py : Scaled Dot-Product Attention and Multi-Head Attention
positional_encoding.py : Sinusoidal positional encoding
feedforward.py : Two-layer Feed Forward network
encoder_block.py : One Transformer Encoder Block
transformer_encoder.py : Full Transformer Encoder (stack of blocks)
glove_loader.py : Load pre-trained GloVe embeddings
train_toy_example.py : Train Transformer + simple classifier on toy synthetic task
test_sentence.py : Pass real-world sentences through the Transformer Encoder for prediction

pip install numpy matplotlib

python train_toy_example.py

This trains the Transformer + Classifier on a synthetic dataset.

python test_sentence.py
This processes a real English sentence and predicts a class.

Built to learn deeply from research papers.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017).
"Attention is All You Need". Advances in Neural Information Processing Systems (NeurIPS).

This project implements the Transformer Encoder architecture described in the paper above for educational and research purposes.