[None][chore] Remove glm_moe_dsa tokenizer WAR after Transformers 5.x upgrade by longlee0622 · Pull Request #13901 · NVIDIA/TensorRT-LLM

longlee0622 · 2026-05-08T10:34:23Z

The custom GlmMoeDsaTokenizer was added in #12586 to manually load GLM-5 checkpoints whose tokenizer_config.json uses transformers 5.x features (TokenizersBackend class, list-form extra_special_tokens) that were not supported by the pinned transformers 4.x.

With the dependency now bumped to transformers==5.3.0 (#12829), AutoTokenizer.from_pretrained() handles these checkpoints natively. This removes the workaround:

Delete tensorrt_llm/tokenizer/glm_moe_dsa/.
Drop the 'glm_moe_dsa' alias from TOKENIZER_ALIASES in both tokenizer/tokenizer.py and llmapi/llm_args.py. The 'deepseek_v32' alias and the --custom_tokenizer plumbing remain (used by the DeepSeek-V3.2 workaround for an unrelated issue).
Drop custom_tokenizer="glm_moe_dsa" from the GLM-5 accuracy and guided-decoding tests, and from the deployment guide YAML.
Update --custom_tokenizer help text to drop glm_moe_dsa from the examples (deepseek_v32 kept).

Verified end-to-end on 8xB200 with transformers==5.3.0:

AutoTokenizer.from_pretrained() on /models/GLM-5-NVFP4 produces output equivalent to GlmMoeDsaTokenizer (vocab/special tokens/ chat_template/encoded ids all match).
trtllm-serve loads GLM-5-NVFP4 with custom_tokenizer=None and answers chat completions correctly.

Summary by CodeRabbit

Removed Features
- The GLM-Moe-Dsa custom tokenizer is no longer supported. Configurations using the glm_moe_dsa alias require updates.
Documentation
- Updated deployment guides and CLI help text to reflect removed tokenizer support.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

… upgrade The custom GlmMoeDsaTokenizer was added in NVIDIA#12586 to manually load GLM-5 checkpoints whose tokenizer_config.json uses transformers 5.x features (TokenizersBackend class, list-form extra_special_tokens) that were not supported by the pinned transformers 4.x. With the dependency now bumped to transformers==5.3.0 (NVIDIA#12829), AutoTokenizer.from_pretrained() handles these checkpoints natively. This removes the workaround: - Delete tensorrt_llm/tokenizer/glm_moe_dsa/. - Drop the 'glm_moe_dsa' alias from TOKENIZER_ALIASES in both tokenizer/tokenizer.py and llmapi/llm_args.py. The 'deepseek_v32' alias and the --custom_tokenizer plumbing remain (used by the DeepSeek-V3.2 workaround for an unrelated issue). - Drop custom_tokenizer="glm_moe_dsa" from the GLM-5 accuracy and guided-decoding tests, and from the deployment guide YAML. - Update --custom_tokenizer help text to drop glm_moe_dsa from the examples (deepseek_v32 kept). Verified end-to-end on 8xB200 with transformers==5.3.0: - AutoTokenizer.from_pretrained() on /models/GLM-5-NVFP4 produces output equivalent to GlmMoeDsaTokenizer (vocab/special tokens/ chat_template/encoded ids all match). - trtllm-serve loads GLM-5-NVFP4 with custom_tokenizer=None and answers chat completions correctly. Signed-off-by: Jonas Li <6110159+longlee0622@users.noreply.github.com>

coderabbitai · 2026-05-08T10:37:23Z

📝 Walkthrough

Walkthrough

This PR removes the GLM-Moe-Dsa tokenizer implementation, its public exports, registry entries, CLI help text references, deployment guide examples, and test configurations. The change is a complete deprecation of a specific tokenizer type.

Changes

GLM-Moe-Dsa Tokenizer Removal

Layer / File(s)	Summary
Core Implementation Removal `tensorrt_llm/tokenizer/glm_moe_dsa/tokenizer.py`, `tensorrt_llm/tokenizer/glm_moe_dsa/__init__.py`	Deletes `GlmMoeDsaTokenizer` class (98 lines), `_load_tokenizer_config()` helper, and the package-level export in `__init__.py`.
Tokenizer Alias Registry Updates `tensorrt_llm/tokenizer/tokenizer.py`, `tensorrt_llm/llmapi/llm_args.py`	Removes `glm_moe_dsa` alias entry from `TOKENIZER_ALIASES` mappings in both locations; `deepseek_v32` alias is retained.
CLI Help Text Updates `tensorrt_llm/bench/benchmark/low_latency.py`, `tensorrt_llm/bench/benchmark/throughput.py`, `tensorrt_llm/serve/scripts/benchmark_serving.py`	Updates `--custom_tokenizer` help strings to remove `glm_moe_dsa` from displayed alias examples.
Docstring Formatting `tensorrt_llm/bench/utils/data.py`	Adjusts `initialize_tokenizer` docstring line wrapping for `custom_tokenizer` argument.
Deployment Guide & Tests `docs/source/deployment-guide/deployment-guide-for-glm-5-on-trtllm.md`, `tests/integration/defs/accuracy/test_llm_api_pytorch.py`, `tests/unittest/llmapi/apps/_test_openai_chat_guided_decoding.py`	Removes `custom_tokenizer: glm_moe_dsa` from B200 FP8 config examples and YAML documentation; updates `TestGLM5FP8` config and fixture to remove GLM-5-FP8 custom tokenizer branch.

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Description check	❓ Inconclusive	PR description is provided but does not follow the required template structure with dedicated Description, Test Coverage, and PR Checklist sections properly filled.	Restructure the description to clearly separate: (1) Description section explaining the issue and solution, (2) Test Coverage section listing relevant tests, (3) PR Checklist confirmation. Currently these are mixed into a single narrative block.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main change: removing the glm_moe_dsa tokenizer workaround after the Transformers 5.x upgrade.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

longlee0622 · 2026-05-08T10:43:40Z

/bot run

tensorrt-cicd · 2026-05-08T10:50:08Z

PR_Github #47388 [ run ] triggered by Bot. Commit: 15c6a77 Link to invocation

tensorrt-cicd · 2026-05-08T15:45:09Z

PR_Github #47388 [ run ] completed with state SUCCESS. Commit: 15c6a77
/LLM/main/L0_MergeRequest_PR pipeline #37317 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

longlee0622 · 2026-05-08T23:07:16Z

/bot run --disable-fail-fast

longlee0622 · 2026-05-08T23:08:34Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-08T23:13:55Z

PR_Github #47448 [ run ] triggered by Bot. Commit: 15c6a77 Link to invocation

tensorrt-cicd · 2026-05-08T23:14:51Z

PR_Github #47447 [ run ] triggered by Bot. Commit: 15c6a77 Link to invocation

tensorrt-cicd · 2026-05-08T23:14:55Z

PR_Github #47448 [ run ] completed with state ABORTED. Commit: 15c6a77

Link to invocation

tensorrt-cicd · 2026-05-09T04:04:16Z

PR_Github #47447 [ run ] completed with state SUCCESS. Commit: 15c6a77
/LLM/main/L0_MergeRequest_PR pipeline #37369 completed with status: 'SUCCESS'

CI Report

Link to invocation

longlee0622 requested review from a team as code owners May 8, 2026 10:34

longlee0622 requested review from FrankD412, QiJune, arysef and zhenhuaw-me May 8, 2026 10:34

github-actions Bot assigned longlee0622 May 8, 2026

longlee0622 closed this May 8, 2026

longlee0622 reopened this May 8, 2026

longlee0622 enabled auto-merge (squash) May 9, 2026 03:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[None][chore] Remove glm_moe_dsa tokenizer WAR after Transformers 5.x upgrade#13901

[None][chore] Remove glm_moe_dsa tokenizer WAR after Transformers 5.x upgrade#13901
longlee0622 wants to merge 1 commit intoNVIDIA:mainfrom
longlee0622:feat/remove-glm-moe-dsa-tokenizer-war

longlee0622 commented May 8, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 8, 2026 •

edited

Loading

Walkthrough

Changes

❌ Failed checks (1 inconclusive)

Uh oh!

longlee0622 commented May 8, 2026

Uh oh!

tensorrt-cicd commented May 8, 2026

Uh oh!

tensorrt-cicd commented May 8, 2026

Uh oh!

longlee0622 commented May 8, 2026

Uh oh!

longlee0622 commented May 8, 2026

Uh oh!

tensorrt-cicd commented May 8, 2026

Uh oh!

tensorrt-cicd commented May 8, 2026

Uh oh!

tensorrt-cicd commented May 8, 2026

Uh oh!

tensorrt-cicd commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

longlee0622 commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

coderabbitai Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

❌ Failed checks (1 inconclusive)

Uh oh!

longlee0622 commented May 8, 2026

Uh oh!

tensorrt-cicd commented May 8, 2026

Uh oh!

tensorrt-cicd commented May 8, 2026

Uh oh!

longlee0622 commented May 8, 2026

Uh oh!

longlee0622 commented May 8, 2026

Uh oh!

tensorrt-cicd commented May 8, 2026

Uh oh!

tensorrt-cicd commented May 8, 2026

Uh oh!

tensorrt-cicd commented May 8, 2026

Uh oh!

tensorrt-cicd commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

longlee0622 commented May 8, 2026 •

edited

Loading

coderabbitai Bot commented May 8, 2026 •

edited

Loading