Skip to content

[None][chore] Remove glm_moe_dsa tokenizer WAR after Transformers 5.x upgrade#13901

Open
longlee0622 wants to merge 1 commit intoNVIDIA:mainfrom
longlee0622:feat/remove-glm-moe-dsa-tokenizer-war
Open

[None][chore] Remove glm_moe_dsa tokenizer WAR after Transformers 5.x upgrade#13901
longlee0622 wants to merge 1 commit intoNVIDIA:mainfrom
longlee0622:feat/remove-glm-moe-dsa-tokenizer-war

Conversation

@longlee0622
Copy link
Copy Markdown
Collaborator

@longlee0622 longlee0622 commented May 8, 2026

The custom GlmMoeDsaTokenizer was added in #12586 to manually load GLM-5 checkpoints whose tokenizer_config.json uses transformers 5.x features (TokenizersBackend class, list-form extra_special_tokens) that were not supported by the pinned transformers 4.x.

With the dependency now bumped to transformers==5.3.0 (#12829), AutoTokenizer.from_pretrained() handles these checkpoints natively. This removes the workaround:

  • Delete tensorrt_llm/tokenizer/glm_moe_dsa/.
  • Drop the 'glm_moe_dsa' alias from TOKENIZER_ALIASES in both tokenizer/tokenizer.py and llmapi/llm_args.py. The 'deepseek_v32' alias and the --custom_tokenizer plumbing remain (used by the DeepSeek-V3.2 workaround for an unrelated issue).
  • Drop custom_tokenizer="glm_moe_dsa" from the GLM-5 accuracy and guided-decoding tests, and from the deployment guide YAML.
  • Update --custom_tokenizer help text to drop glm_moe_dsa from the examples (deepseek_v32 kept).

Verified end-to-end on 8xB200 with transformers==5.3.0:

  • AutoTokenizer.from_pretrained() on /models/GLM-5-NVFP4 produces output equivalent to GlmMoeDsaTokenizer (vocab/special tokens/ chat_template/encoded ids all match).
  • trtllm-serve loads GLM-5-NVFP4 with custom_tokenizer=None and answers chat completions correctly.

Summary by CodeRabbit

  • Removed Features

    • The GLM-Moe-Dsa custom tokenizer is no longer supported. Configurations using the glm_moe_dsa alias require updates.
  • Documentation

    • Updated deployment guides and CLI help text to reflect removed tokenizer support.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

… upgrade

The custom GlmMoeDsaTokenizer was added in NVIDIA#12586 to manually load
GLM-5 checkpoints whose tokenizer_config.json uses transformers 5.x
features (TokenizersBackend class, list-form extra_special_tokens)
that were not supported by the pinned transformers 4.x.

With the dependency now bumped to transformers==5.3.0 (NVIDIA#12829),
AutoTokenizer.from_pretrained() handles these checkpoints natively.
This removes the workaround:
  - Delete tensorrt_llm/tokenizer/glm_moe_dsa/.
  - Drop the 'glm_moe_dsa' alias from TOKENIZER_ALIASES in both
    tokenizer/tokenizer.py and llmapi/llm_args.py. The 'deepseek_v32'
    alias and the --custom_tokenizer plumbing remain (used by the
    DeepSeek-V3.2 workaround for an unrelated issue).
  - Drop custom_tokenizer="glm_moe_dsa" from the GLM-5 accuracy and
    guided-decoding tests, and from the deployment guide YAML.
  - Update --custom_tokenizer help text to drop glm_moe_dsa from the
    examples (deepseek_v32 kept).

Verified end-to-end on 8xB200 with transformers==5.3.0:
  - AutoTokenizer.from_pretrained() on /models/GLM-5-NVFP4 produces
    output equivalent to GlmMoeDsaTokenizer (vocab/special tokens/
    chat_template/encoded ids all match).
  - trtllm-serve loads GLM-5-NVFP4 with custom_tokenizer=None and
    answers chat completions correctly.

Signed-off-by: Jonas Li <6110159+longlee0622@users.noreply.github.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 8, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR removes the GLM-Moe-Dsa tokenizer implementation, its public exports, registry entries, CLI help text references, deployment guide examples, and test configurations. The change is a complete deprecation of a specific tokenizer type.

Changes

GLM-Moe-Dsa Tokenizer Removal

Layer / File(s) Summary
Core Implementation Removal
tensorrt_llm/tokenizer/glm_moe_dsa/tokenizer.py, tensorrt_llm/tokenizer/glm_moe_dsa/__init__.py
Deletes GlmMoeDsaTokenizer class (98 lines), _load_tokenizer_config() helper, and the package-level export in __init__.py.
Tokenizer Alias Registry Updates
tensorrt_llm/tokenizer/tokenizer.py, tensorrt_llm/llmapi/llm_args.py
Removes glm_moe_dsa alias entry from TOKENIZER_ALIASES mappings in both locations; deepseek_v32 alias is retained.
CLI Help Text Updates
tensorrt_llm/bench/benchmark/low_latency.py, tensorrt_llm/bench/benchmark/throughput.py, tensorrt_llm/serve/scripts/benchmark_serving.py
Updates --custom_tokenizer help strings to remove glm_moe_dsa from displayed alias examples.
Docstring Formatting
tensorrt_llm/bench/utils/data.py
Adjusts initialize_tokenizer docstring line wrapping for custom_tokenizer argument.
Deployment Guide & Tests
docs/source/deployment-guide/deployment-guide-for-glm-5-on-trtllm.md, tests/integration/defs/accuracy/test_llm_api_pytorch.py, tests/unittest/llmapi/apps/_test_openai_chat_guided_decoding.py
Removes custom_tokenizer: glm_moe_dsa from B200 FP8 config examples and YAML documentation; updates TestGLM5FP8 config and fixture to remove GLM-5-FP8 custom tokenizer branch.

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Description check ❓ Inconclusive PR description is provided but does not follow the required template structure with dedicated Description, Test Coverage, and PR Checklist sections properly filled. Restructure the description to clearly separate: (1) Description section explaining the issue and solution, (2) Test Coverage section listing relevant tests, (3) PR Checklist confirmation. Currently these are mixed into a single narrative block.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: removing the glm_moe_dsa tokenizer workaround after the Transformers 5.x upgrade.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Comment @coderabbitai help to get the list of available commands and usage tips.

@longlee0622
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47388 [ run ] triggered by Bot. Commit: 15c6a77 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47388 [ run ] completed with state SUCCESS. Commit: 15c6a77
/LLM/main/L0_MergeRequest_PR pipeline #37317 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@longlee0622
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@longlee0622 longlee0622 closed this May 8, 2026
@longlee0622 longlee0622 reopened this May 8, 2026
@longlee0622
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47448 [ run ] triggered by Bot. Commit: 15c6a77 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47447 [ run ] triggered by Bot. Commit: 15c6a77 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47448 [ run ] completed with state ABORTED. Commit: 15c6a77

Link to invocation

@longlee0622 longlee0622 enabled auto-merge (squash) May 9, 2026 03:47
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47447 [ run ] completed with state SUCCESS. Commit: 15c6a77
/LLM/main/L0_MergeRequest_PR pipeline #37369 completed with status: 'SUCCESS'

CI Report

Link to invocation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants