Skip to content

fix(tokenizer): fall back to direct fast-tokenizer load when model config build fails#72

Merged
hallerite merged 1 commit into
mainfrom
fix/tokenizer-config-rope-fallback
May 27, 2026
Merged

fix(tokenizer): fall back to direct fast-tokenizer load when model config build fails#72
hallerite merged 1 commit into
mainfrom
fix/tokenizer-config-rope-fallback

Conversation

@hallerite
Copy link
Copy Markdown
Member

@hallerite hallerite commented May 27, 2026

Problem

Loading a tokenizer for poolside/Laguna-XS.2 (and any model with nested per-layer rope_parameters) crashes renderers:

KeyError: "Missing required keys in `rope_parameters` for 'rope_type'='default': {'rope_theta'}"

The cause is entirely a modeling-layer concern leaking into tokenizer loading:

  • AutoTokenizer.from_pretrained always constructs the model config first (to resolve the tokenizer class) — even for a plain PreTrainedTokenizerFast.
  • Building Laguna's config runs HF's RoPE validator. Laguna's rope_parameters are nested (full_attention / sliding_attention) with no top-level rope_theta. vLLM's patch_rope_parameters injects rope_theta via standardize_rope_params() before validating, so vLLM loads fine — but plain transformers validates during __init__ without that step and raises.
  • AutoTokenizer only catches ValueError/OSError, so the KeyError escapes and kills the load.

renderers needs the tokenizer, not the model — it should never have been dragged through RoPE validation.

Fix

When AutoTokenizer.from_pretrained fails while building the model config, fall back to loading the repo's self-contained tokenizer.json directly via PreTrainedTokenizerFast, which never touches the model config. The fallback:

  • is modeling-agnostic — no Laguna/RoPE-specific knowledge, just "if the model config blew up but the tokenizer is self-describing, load it directly";
  • runs under the fastokens patch, so Laguna keeps the Rust fast-path speedup (verified: backend is the fastokens shim, encode output byte-identical to vanilla);
  • excludes custom auto_map tokenizers (e.g. Kimi-K2), which must keep going through AutoTokenizer + trust_remote_code;
  • re-raises the original error if there's no usable fast tokenizer, so genuine failures still surface.

Laguna now loads via the fastokens fast path and routes to LagunaXS2Renderer, with a single clear INFO line and no misleading "fastokens could not load" warning.

Verification

  • poolside/Laguna-XS.2: load_tokenizer → fastokens-backed TokenizersBackend, create_rendererLagunaXS2Renderer, encode matches vanilla.
  • Qwen/Qwen3-0.6B and the existing tests/test_load_tokenizer_fastokens.py suite (9 tests): no regression.

🤖 Generated with Claude Code


Note

Medium Risk
Changes central tokenizer loading for every model; behavior is gated on auto_map and re-raises when no fast tokenizer exists, but broad except on the fallback path could mask unrelated load failures in edge cases.

Overview
Tokenizer loading no longer dies when Hugging Face builds the model config during AutoTokenizer.from_pretrained (e.g. RoPE validation on poolside/Laguna-XS.2). A shared _load_tokenizer_via_auto wrapper tries AutoTokenizer first; on failure it loads PreTrainedTokenizerFast from tokenizer.json via _load_fast_tokenizer_directly, skipping config construction when the repo has no custom auto_map tokenizer.

_patched_load and all load_tokenizer paths (vanilla, fastokens, fastokens fallback) now go through that wrapper so the fallback applies under the fastokens patch too. Custom remote tokenizers still require AutoTokenizer; if direct load isn’t safe, the original exception is re-raised.

Reviewed by Cursor Bugbot for commit 30655d6. Bugbot is set up for automated code reviews on this repo. Configure here.

Note

Fix tokenizer loading to fall back to direct tokenizer.json load when AutoTokenizer fails

  • Adds _load_fast_tokenizer_directly in renderers/base.py to load a PreTrainedTokenizerFast straight from tokenizer.json, bypassing model config construction.
  • Adds _load_tokenizer_via_auto as a wrapper around AutoTokenizer.from_pretrained that catches failures and retries via the new direct loader, re-raising the original exception only if the fallback also fails.
  • Updates load_tokenizer and _patched_load to use _load_tokenizer_via_auto so the fallback applies in all tokenizer load paths.
  • The direct fallback is skipped if the tokenizer config declares an auto_map entry, since those require the full model config.

Macroscope summarized 30655d6.

@hallerite hallerite marked this pull request as ready for review May 27, 2026 22:01
…nfig build fails

`AutoTokenizer.from_pretrained` eagerly constructs the *model* config to
resolve the tokenizer class — even for a plain `PreTrainedTokenizerFast`.
That construction runs HF's RoPE validator, which rejects configs carrying
nested `rope_parameters` (e.g. poolside/Laguna-XS.2: `full_attention` /
`sliding_attention` blocks with no top-level `rope_theta`) when the config
is built outside vLLM's `patch_rope_parameters`. The resulting `KeyError`
escapes (AutoTokenizer only catches `ValueError`/`OSError`) and kills the
tokenizer load — a modeling-only concern breaking something the tokenizer
never needed.

renderers needs the tokenizer, not the model. When `AutoTokenizer` fails
while building the config, fall back to loading the repo's self-contained
`tokenizer.json` directly via `PreTrainedTokenizerFast`, which never touches
the model config. The fallback runs under the fastokens patch, so models
like Laguna keep the Rust fast-path speedup. Custom `auto_map` tokenizers
and repos without a fast tokenizer are left to surface the original error.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@hallerite hallerite force-pushed the fix/tokenizer-config-rope-fallback branch from aec2346 to 30655d6 Compare May 27, 2026 22:02
@rasdani rasdani self-requested a review May 27, 2026 22:06
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 30655d6. Configure here.

Comment thread renderers/base.py
surface its original error instead.
"""
from transformers import PreTrainedTokenizerFast
from transformers.models.auto.tokenization_auto import get_tokenizer_config
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Private import outside try/except masks original errors

Low Severity

The imports in _load_fast_tokenizer_directly — especially the private from transformers.models.auto.tokenization_auto import get_tokenizer_config — sit outside the try/except block at line 1132. Since this function is called from within _load_tokenizer_via_auto's except handler, an ImportError from the private API path would propagate upward and effectively replace the original meaningful exception (e.g. the KeyError from RoPE validation). Moving both imports inside the existing try block would let any import failure return None gracefully, preserving the original error for re-raise.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 30655d6. Configure here.

@macroscopeapp
Copy link
Copy Markdown

macroscopeapp Bot commented May 27, 2026

Approvability

Verdict: Needs human review

This PR introduces a new fallback code path for tokenizer loading (~60 lines of new logic) that changes runtime behavior when AutoTokenizer fails. The use of a private transformers API and the non-trivial error-handling changes warrant human verification of the approach.

You can customize Macroscope's approvability policy. Learn more.

@hallerite hallerite merged commit 89ab3f0 into main May 27, 2026
11 checks passed
@hallerite hallerite deleted the fix/tokenizer-config-rope-fallback branch May 27, 2026 22:19
hallerite added a commit to PrimeIntellect-ai/prime-rl that referenced this pull request May 27, 2026
Bumps the deps/renderers submodule 2ec28a8 (v0.1.8.dev28) -> 89ab3f0
(v0.1.8.dev35), pulling in PrimeIntellect-ai/renderers#72: when
AutoTokenizer.from_pretrained fails while building the model config
(e.g. HF RoPE validation rejecting nested rope_parameters for
poolside/Laguna-XS.2), fall back to loading the repo's self-contained
tokenizer.json directly. Fixes the tokenizer load crash for the Laguna
model series; loads under the fastokens fast path.

Re-locks uv.lock: renderers now floors openai-harmony at >=0.0.4 (renderers#69).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
rasdani pushed a commit to PrimeIntellect-ai/prime-rl that referenced this pull request May 27, 2026
…na) (#2657)

Bumps the deps/renderers submodule 2ec28a8 (v0.1.8.dev28) -> 89ab3f0
(v0.1.8.dev35), pulling in PrimeIntellect-ai/renderers#72: when
AutoTokenizer.from_pretrained fails while building the model config
(e.g. HF RoPE validation rejecting nested rope_parameters for
poolside/Laguna-XS.2), fall back to loading the repo's self-contained
tokenizer.json directly. Fixes the tokenizer load crash for the Laguna
model series; loads under the fastokens fast path.

Re-locks uv.lock: renderers now floors openai-harmony at >=0.0.4 (renderers#69).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants