feat: route load_tokenizer through fastokens by default by hallerite · Pull Request #10 · PrimeIntellect-ai/renderers

hallerite · 2026-05-07T18:08:15Z

Summary

Add fastokens (Crusoe's Rust BPE tokenizer, ~10x faster encode) as a required dependency and patch it in by default for every supported model except a small denylist. The patch is bracketed around from_pretrained, so the loaded tokenizer keeps the fastokens shim while the user's process-global AutoTokenizer.from_pretrained stays vanilla.

Audit results — every entry of `MODEL_RENDERER_MAP`

Probe: 5-case encoding (plain text, Lorem, emoji + CJK, literal <|im_start|>, 200-word long), vanilla vs fastokens-patched.

Result	Count	Models
Byte-identical	31/35	All Qwen3.x (10), Qwen3.5 (3), Qwen3.6 (1), Qwen3-VL (3), GLM-5 / 4.7 / 5.1 (3), GLM-4.5 family (2), Kimi-K2.x (3), Nemotron 3 (2), Llama-3.2 Instruct (2), gpt-oss (2)
Load error	2/35	`deepseek-ai/DeepSeek-V3{,-Base}` — fastokens 0.1.1 doesn't support the Metaspace pretokenizer
Silent encode divergence	2/35	`MiniMaxAI/MiniMax-M2{,.5}` — encodes literal `<

The 4 incompat models live in FASTOKENS_INCOMPATIBLE (frozenset in renderers/base.py) and skip the patch unconditionally. Unknown / fine-tuned models hit the patched path first and fall back to vanilla on any fastokens load error (logged at INFO).

Implementation notes

def load_tokenizer(model_name_or_path, *, use_fastokens=True):
    ...
    if not use_fastokens or model_name_or_path in FASTOKENS_INCOMPATIBLE:
        return AutoTokenizer.from_pretrained(...)
    try:
        return _patched_load(...)  # patches fastokens, loads, unpatches
    except Exception:
        logger.info("fastokens couldn't load %r; falling back ...")
        return AutoTokenizer.from_pretrained(...)

Per-call patch/unpatch keeps the side effect minimal: the returned tokenizer keeps the fastokens shim (because fastokens captures the backend at load time), but subsequent AutoTokenizer.from_pretrained calls outside load_tokenizer stay vanilla. Verified by test_patch_is_unloaded_after_call.

Tests

tests/test_load_tokenizer_fastokens.py — 10 cases pinning the policy:

FASTOKENS_INCOMPATIBLE exact-shape lock
Default path produces a fastokens-shim backend
use_fastokens=False produces a vanilla backend
Encode parity vanilla vs fastokens on Qwen3.5-9B (4 sample strings)
Each of the 4 incompat models loads via vanilla (skips the patch)
Patch leak test: a direct AutoTokenizer.from_pretrained outside load_tokenizer stays vanilla
Simulated fastokens load failure falls back to vanilla cleanly

Existing 900-test suite passes unchanged with fastokens patched globally — verified before designing this PR (full run: 947 passed on the feat/llama-3-renderer branch with fastokens. This branch off origin/main: 910 passed, 48 skipped, 1 xfailed in 129.15s).

No version bump — batched with the open Qwen3.5 / Llama-3 PRs.

Test plan

pytest tests/test_load_tokenizer_fastokens.py — 10 cases pass
Full suite (pytest tests/ --ignore=tests/test_client.py) — 910 pass, 48 skipped, 1 xfailed (no regressions)
Per-model byte parity probe over all 35 MODEL_RENDERER_MAP entries — 31 PARITY, 4 denylisted by deliberate review
Pre-commit hooks (ruff check + format) clean

🤖 Generated with Claude Code

Add ``fastokens`` (Crusoe's Rust BPE tokenizer, ~10x faster encode) as a required dependency and patch it in by default for every supported model except a small denylist. The patch is bracketed: ``patch`` → ``from_pretrained`` → ``unpatch``, so the loaded tokenizer keeps the fastokens shim while the user's process-global ``AutoTokenizer.from_pretrained`` stays vanilla. Empirically verified across all 35 entries in MODEL_RENDERER_MAP: 31/35 byte-identical with vanilla on a 5-case encoding probe (plain, Lorem, emoji+CJK, special-token-literal text, long). 2/35 fail to load: deepseek-ai/DeepSeek-V3{,-Base} — fastokens 0.1.1 doesn't support the Metaspace pretokenizer. 2/35 silently diverge on content containing literal ``<|im_start|>``- like text: MiniMaxAI/MiniMax-M2{,.5}. The 4 incompat models live in FASTOKENS_INCOMPATIBLE and skip the patch unconditionally. Unknown / fine-tuned models hit the patched fast path first and fall back to vanilla on any fastokens load error (logged at INFO). Existing 900-test suite passes unchanged with fastokens patched globally; new test_load_tokenizer_fastokens.py adds 10 cases pinning the policy: * FASTOKENS_INCOMPATIBLE shape (deliberate-review on changes) * Default path produces a fastokens-shim backend * ``use_fastokens=False`` produces a vanilla backend * Encode parity vanilla vs fastokens on a representative model * Each incompat model loads via vanilla (skips the patch) * The patch doesn't leak: a direct AutoTokenizer.from_pretrained outside load_tokenizer stays vanilla * Simulated fastokens load failure falls back to vanilla cleanly No version bump (batched with the open Qwen3.5 / Llama-3 PRs). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: route load_tokenizer through fastokens by default#10

feat: route load_tokenizer through fastokens by default#10
hallerite wants to merge 1 commit intomainfrom
feat/fastokens-default

hallerite commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hallerite commented May 7, 2026

Summary

Audit results — every entry of MODEL_RENDERER_MAP

Implementation notes

Tests

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Audit results — every entry of `MODEL_RENDERER_MAP`