feat: route load_tokenizer through fastokens by default#10
Open
feat: route load_tokenizer through fastokens by default#10
Conversation
Add ``fastokens`` (Crusoe's Rust BPE tokenizer, ~10x faster encode) as a
required dependency and patch it in by default for every supported
model except a small denylist. The patch is bracketed: ``patch`` →
``from_pretrained`` → ``unpatch``, so the loaded tokenizer keeps the
fastokens shim while the user's process-global
``AutoTokenizer.from_pretrained`` stays vanilla.
Empirically verified across all 35 entries in MODEL_RENDERER_MAP:
31/35 byte-identical with vanilla on a 5-case encoding probe
(plain, Lorem, emoji+CJK, special-token-literal text, long).
2/35 fail to load: deepseek-ai/DeepSeek-V3{,-Base} — fastokens 0.1.1
doesn't support the Metaspace pretokenizer.
2/35 silently diverge on content containing literal ``<|im_start|>``-
like text: MiniMaxAI/MiniMax-M2{,.5}.
The 4 incompat models live in FASTOKENS_INCOMPATIBLE and skip the patch
unconditionally. Unknown / fine-tuned models hit the patched fast path
first and fall back to vanilla on any fastokens load error (logged at
INFO).
Existing 900-test suite passes unchanged with fastokens patched
globally; new test_load_tokenizer_fastokens.py adds 10 cases pinning
the policy:
* FASTOKENS_INCOMPATIBLE shape (deliberate-review on changes)
* Default path produces a fastokens-shim backend
* ``use_fastokens=False`` produces a vanilla backend
* Encode parity vanilla vs fastokens on a representative model
* Each incompat model loads via vanilla (skips the patch)
* The patch doesn't leak: a direct AutoTokenizer.from_pretrained
outside load_tokenizer stays vanilla
* Simulated fastokens load failure falls back to vanilla cleanly
No version bump (batched with the open Qwen3.5 / Llama-3 PRs).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add fastokens (Crusoe's Rust BPE tokenizer, ~10x faster encode) as a required dependency and patch it in by default for every supported model except a small denylist. The patch is bracketed around
from_pretrained, so the loaded tokenizer keeps the fastokens shim while the user's process-globalAutoTokenizer.from_pretrainedstays vanilla.Audit results — every entry of
MODEL_RENDERER_MAPProbe: 5-case encoding (plain text, Lorem, emoji + CJK, literal
<|im_start|>, 200-word long), vanilla vs fastokens-patched.deepseek-ai/DeepSeek-V3{,-Base}— fastokens 0.1.1 doesn't support the Metaspace pretokenizerMiniMaxAI/MiniMax-M2{,.5}— encodes literal `<The 4 incompat models live in
FASTOKENS_INCOMPATIBLE(frozensetinrenderers/base.py) and skip the patch unconditionally. Unknown / fine-tuned models hit the patched path first and fall back to vanilla on any fastokens load error (logged at INFO).Implementation notes
Per-call patch/unpatch keeps the side effect minimal: the returned tokenizer keeps the fastokens shim (because fastokens captures the backend at load time), but subsequent
AutoTokenizer.from_pretrainedcalls outsideload_tokenizerstay vanilla. Verified bytest_patch_is_unloaded_after_call.Tests
tests/test_load_tokenizer_fastokens.py— 10 cases pinning the policy:FASTOKENS_INCOMPATIBLEexact-shape lockuse_fastokens=Falseproduces a vanilla backendAutoTokenizer.from_pretrainedoutsideload_tokenizerstays vanillaExisting 900-test suite passes unchanged with fastokens patched globally — verified before designing this PR (full run:
947 passedon the feat/llama-3-renderer branch with fastokens. This branch offorigin/main:910 passed, 48 skipped, 1 xfailed in 129.15s).No version bump — batched with the open Qwen3.5 / Llama-3 PRs.
Test plan
pytest tests/test_load_tokenizer_fastokens.py— 10 cases passpytest tests/ --ignore=tests/test_client.py) — 910 pass, 48 skipped, 1 xfailed (no regressions)🤖 Generated with Claude Code