fix(tokenizer): fall back to direct fast-tokenizer load when model config build fails#72
Conversation
…nfig build fails `AutoTokenizer.from_pretrained` eagerly constructs the *model* config to resolve the tokenizer class — even for a plain `PreTrainedTokenizerFast`. That construction runs HF's RoPE validator, which rejects configs carrying nested `rope_parameters` (e.g. poolside/Laguna-XS.2: `full_attention` / `sliding_attention` blocks with no top-level `rope_theta`) when the config is built outside vLLM's `patch_rope_parameters`. The resulting `KeyError` escapes (AutoTokenizer only catches `ValueError`/`OSError`) and kills the tokenizer load — a modeling-only concern breaking something the tokenizer never needed. renderers needs the tokenizer, not the model. When `AutoTokenizer` fails while building the config, fall back to loading the repo's self-contained `tokenizer.json` directly via `PreTrainedTokenizerFast`, which never touches the model config. The fallback runs under the fastokens patch, so models like Laguna keep the Rust fast-path speedup. Custom `auto_map` tokenizers and repos without a fast tokenizer are left to surface the original error. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
aec2346 to
30655d6
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 30655d6. Configure here.
| surface its original error instead. | ||
| """ | ||
| from transformers import PreTrainedTokenizerFast | ||
| from transformers.models.auto.tokenization_auto import get_tokenizer_config |
There was a problem hiding this comment.
Private import outside try/except masks original errors
Low Severity
The imports in _load_fast_tokenizer_directly — especially the private from transformers.models.auto.tokenization_auto import get_tokenizer_config — sit outside the try/except block at line 1132. Since this function is called from within _load_tokenizer_via_auto's except handler, an ImportError from the private API path would propagate upward and effectively replace the original meaningful exception (e.g. the KeyError from RoPE validation). Moving both imports inside the existing try block would let any import failure return None gracefully, preserving the original error for re-raise.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 30655d6. Configure here.
ApprovabilityVerdict: Needs human review This PR introduces a new fallback code path for tokenizer loading (~60 lines of new logic) that changes runtime behavior when AutoTokenizer fails. The use of a private transformers API and the non-trivial error-handling changes warrant human verification of the approach. You can customize Macroscope's approvability policy. Learn more. |
Bumps the deps/renderers submodule 2ec28a8 (v0.1.8.dev28) -> 89ab3f0 (v0.1.8.dev35), pulling in PrimeIntellect-ai/renderers#72: when AutoTokenizer.from_pretrained fails while building the model config (e.g. HF RoPE validation rejecting nested rope_parameters for poolside/Laguna-XS.2), fall back to loading the repo's self-contained tokenizer.json directly. Fixes the tokenizer load crash for the Laguna model series; loads under the fastokens fast path. Re-locks uv.lock: renderers now floors openai-harmony at >=0.0.4 (renderers#69). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…na) (#2657) Bumps the deps/renderers submodule 2ec28a8 (v0.1.8.dev28) -> 89ab3f0 (v0.1.8.dev35), pulling in PrimeIntellect-ai/renderers#72: when AutoTokenizer.from_pretrained fails while building the model config (e.g. HF RoPE validation rejecting nested rope_parameters for poolside/Laguna-XS.2), fall back to loading the repo's self-contained tokenizer.json directly. Fixes the tokenizer load crash for the Laguna model series; loads under the fastokens fast path. Re-locks uv.lock: renderers now floors openai-harmony at >=0.0.4 (renderers#69). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>


Problem
Loading a tokenizer for poolside/Laguna-XS.2 (and any model with nested per-layer
rope_parameters) crashes renderers:The cause is entirely a modeling-layer concern leaking into tokenizer loading:
AutoTokenizer.from_pretrainedalways constructs the model config first (to resolve the tokenizer class) — even for a plainPreTrainedTokenizerFast.rope_parametersare nested (full_attention/sliding_attention) with no top-levelrope_theta. vLLM'spatch_rope_parametersinjectsrope_thetaviastandardize_rope_params()before validating, so vLLM loads fine — but plain transformers validates during__init__without that step and raises.AutoTokenizeronly catchesValueError/OSError, so theKeyErrorescapes and kills the load.renderers needs the tokenizer, not the model — it should never have been dragged through RoPE validation.
Fix
When
AutoTokenizer.from_pretrainedfails while building the model config, fall back to loading the repo's self-containedtokenizer.jsondirectly viaPreTrainedTokenizerFast, which never touches the model config. The fallback:auto_maptokenizers (e.g. Kimi-K2), which must keep going throughAutoTokenizer+trust_remote_code;Laguna now loads via the fastokens fast path and routes to
LagunaXS2Renderer, with a single clear INFO line and no misleading "fastokens could not load" warning.Verification
poolside/Laguna-XS.2:load_tokenizer→ fastokens-backedTokenizersBackend,create_renderer→LagunaXS2Renderer, encode matches vanilla.Qwen/Qwen3-0.6Band the existingtests/test_load_tokenizer_fastokens.pysuite (9 tests): no regression.🤖 Generated with Claude Code
Note
Medium Risk
Changes central tokenizer loading for every model; behavior is gated on
auto_mapand re-raises when no fast tokenizer exists, but broadexcepton the fallback path could mask unrelated load failures in edge cases.Overview
Tokenizer loading no longer dies when Hugging Face builds the model config during
AutoTokenizer.from_pretrained(e.g. RoPE validation on poolside/Laguna-XS.2). A shared_load_tokenizer_via_autowrapper triesAutoTokenizerfirst; on failure it loadsPreTrainedTokenizerFastfromtokenizer.jsonvia_load_fast_tokenizer_directly, skipping config construction when the repo has no customauto_maptokenizer._patched_loadand allload_tokenizerpaths (vanilla, fastokens, fastokens fallback) now go through that wrapper so the fallback applies under the fastokens patch too. Custom remote tokenizers still requireAutoTokenizer; if direct load isn’t safe, the original exception is re-raised.Reviewed by Cursor Bugbot for commit 30655d6. Bugbot is set up for automated code reviews on this repo. Configure here.
Note
Fix tokenizer loading to fall back to direct
tokenizer.jsonload whenAutoTokenizerfails_load_fast_tokenizer_directlyin renderers/base.py to load aPreTrainedTokenizerFaststraight fromtokenizer.json, bypassing model config construction._load_tokenizer_via_autoas a wrapper aroundAutoTokenizer.from_pretrainedthat catches failures and retries via the new direct loader, re-raising the original exception only if the fallback also fails.load_tokenizerand_patched_loadto use_load_tokenizer_via_autoso the fallback applies in all tokenizer load paths.auto_mapentry, since those require the full model config.Macroscope summarized 30655d6.