Transformer

A polished PyTorch implementation of the current State-Of-The-Art(SOTA) Transformer. Designed for clarity, reproducibility, and interoperability with HuggingFace Transformers, this repository provides a robust baseline for Research and Engineering being Fully Configurable. The codebase emphasizes readable and well-documented components so you can iterate on Feed-Forward, Attention and Normalization blocks and other architectural variants with minimal friction.

Features

Fully Configurable architecture (layers, heads, model dimensions, dropout, etc.)
HuggingFace-compatible API alignment.
Compact and easily extensible design for rapid prototyping and research experiments.
Clear, well-documented modules to facilitate experimentation with attention, FFNs, etc.

Download the code

git clone --depth=1 https://github.com/lof310/transformer
cd transformer

Installation

# Install dependencies
pip install -r requirements.txt

# Install on developer mode (Recommended)
pip install -e .

# Install Normally
pip install .

Quick Start

import torch
import torch.nn as nn
import torch.nn.functional as F

from transformer import Transformer, TransformerConfig

# Configure the model
config = TransformerConfig(
    n_layers = 12,
    n_heads = 32,
    d_model = 1536,
    attn_qk_norm = False,     
    tied_weights = False,
    seq_len = 1024,
    max_seq_len = 4096,
)

# Initialize model
model = Transformer(config)

# Forward Pass
B, N = 16, 1024
input_ids = torch.randint(low=0, high=config.vocab_size, size(B, N))
output = model(input_ids, return_states=False)

Default Configuration

The default configuration implements the latest SOTA Transformer design.

from transformer import TransformerConfig

TransformerConfig(
    n_layers = 12,
    d_model = 1536,
    n_heads = 32,
    n_kv_heads = None, # QKA Disabled
    vocab_size = 50000,
    d_ff = None, # Choosen Automatically, ratio 8/3=2.666
    norm_design = "pre_norm",
    norm_class = "rms_norm",
    ffn_class = "SwiGLU",
    attn_class = "MHA",
    block_class = None, # transformer.TransformerBlock
    attn_bias = False,
    ffn_bias = True,
    lm_head_bias = False,
    attn_qk_norm = True,
    attn_dropout = 0.0,
    tied_weights = False,
    seq_len = 1024,
    pos_encoding = "RoPE",
    rope_base = 10000.0,
    max_seq_len = 4096
)

Documentation

Full Documentation available at This Page

Contributing

Contributions are welcome!

License

Distributed under the Apache License 2.0. See LICENSE for more information.

Citation

If you use transformer in your research, please cite:

@software{transformer2026,
  author = {Leinier Orama},
  title = {transformer: PyTorch implementation of the current State-Of-The-Art(SOTA) Transformer},
  year = {2026},
  publisher = {GitHub},
  url = {https://github.com/lof310/transformer}
}

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.github/workflows		.github/workflows
docs		docs
transformer.egg-info		transformer.egg-info
transformer		transformer
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transformer

Features

Download the code

Installation

Quick Start

Default Configuration

Documentation

Contributing

License

Citation

About

Uh oh!

Releases 4

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Transformer

Features

Download the code

Installation

Quick Start

Default Configuration

Documentation

Contributing

License

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages