Skip to content

lof310/transformer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Transformer

Build Status License PyTorch HuggingFace Compatible Stars Downloads

A polished PyTorch implementation of the current State-Of-The-Art(SOTA) Transformer. Designed for clarity, reproducibility, and interoperability with HuggingFace Transformers, this repository provides a robust baseline for Research and Engineering being Fully Configurable. The codebase emphasizes readable and well-documented components so you can iterate on Feed-Forward, Attention and Normalization blocks and other architectural variants with minimal friction.

Features

  • Fully Configurable architecture (layers, heads, model dimensions, dropout, etc.)
  • HuggingFace-compatible API alignment.
  • Compact and easily extensible design for rapid prototyping and research experiments.
  • Clear, well-documented modules to facilitate experimentation with attention, FFNs, etc.

Download the code

git clone --depth=1 https://github.com/lof310/transformer
cd transformer

Installation

# Install dependencies
pip install -r requirements.txt

# Install on developer mode (Recommended)
pip install -e .

# Install Normally
pip install .

Quick Start

import torch
import torch.nn as nn
import torch.nn.functional as F

from transformer import Transformer, TransformerConfig

# Configure the model
config = TransformerConfig(
    n_layers = 12,
    n_heads = 32,
    d_model = 1536,
    attn_qk_norm = False,     
    tied_weights = False,
    seq_len = 1024,
    max_seq_len = 4096,
)

# Initialize model
model = Transformer(config)

# Forward Pass
B, N = 16, 1024
input_ids = torch.randint(low=0, high=config.vocab_size, size(B, N))
output = model(input_ids, return_states=False)

Default Configuration

The default configuration implements the latest SOTA Transformer design.

from transformer import TransformerConfig

TransformerConfig(
    n_layers = 12,
    d_model = 1536,
    n_heads = 32,
    n_kv_heads = None, # QKA Disabled
    vocab_size = 50000,
    d_ff = None, # Choosen Automatically, ratio 8/3=2.666
    norm_design = "pre_norm",
    norm_class = "rms_norm",
    ffn_class = "SwiGLU",
    attn_class = "MHA",
    block_class = None, # transformer.TransformerBlock
    attn_bias = False,
    ffn_bias = True,
    lm_head_bias = False,
    attn_qk_norm = True,
    attn_dropout = 0.0,
    tied_weights = False,
    seq_len = 1024,
    pos_encoding = "RoPE",
    rope_base = 10000.0,
    max_seq_len = 4096
)

Documentation

Full Documentation available at This Page

Contributing

Contributions are welcome!

License

Distributed under the Apache License 2.0. See LICENSE for more information.

Citation

If you use transformer in your research, please cite:

@software{transformer2026,
  author = {Leinier Orama},
  title = {transformer: PyTorch implementation of the current State-Of-The-Art(SOTA) Transformer},
  year = {2026},
  publisher = {GitHub},
  url = {https://github.com/lof310/transformer}
}

About

PyTorch implementation of the current SOTA Transformer. Configurable, efficient, and HuggingFace-compatible, serving as a baseline for research, benchmarking, and architectural experimentation.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages