Skip to content

Compose split routed experts from vLLM responses#1349

Open
S1ro1 wants to merge 2 commits into
mainfrom
feat/split-routed-experts
Open

Compose split routed experts from vLLM responses#1349
S1ro1 wants to merge 2 commits into
mainfrom
feat/split-routed-experts

Conversation

@S1ro1
Copy link
Copy Markdown
Contributor

@S1ro1 S1ro1 commented May 11, 2026

Summary

  • add a split routed-experts composer for prompt_routed_experts plus completion routed_experts
  • decode the new base64 routed-experts payload emitted by patched vLLM when vllm_xargs.routed_experts_encoding = "base64"
  • update chat, completions, and renderer clients to consume vLLM's new split routed-experts response shape
  • remove the old base85 routed_experts decode path
  • preserve prompt+completion routing when truncating response tokens

Validation

  • uvx ruff@0.15.12 format --isolated --check .
  • uvx ruff@0.15.12 check --isolated .
  • uv run --no-sync ty check verifiers/clients/openai_chat_completions_client.py verifiers/clients/openai_completions_client.py verifiers/clients/renderer_client.py verifiers/clients/routed_experts.py verifiers/utils/response_utils.py
  • uv run ruff check verifiers/clients/routed_experts.py
  • uv run python -m py_compile verifiers/clients/routed_experts.py

Note

Medium Risk
Updates token parsing across multiple clients to new vLLM prompt_routed_experts/routed_experts shapes, which can affect downstream telemetry/analysis if alignment or decoding assumptions are wrong.

Overview
Adds compose_split_routed_experts() to decode and merge vLLM’s split routing outputs (prompt_routed_experts + per-choice routed_experts) into a single ResponseTokens.routed_experts array aligned to prompt+completion tokens (including padding the final completion token).

Updates the chat, completions, and renderer clients to consume the new split fields and removes the prior inline base85/np.frombuffer decode path. Fixes truncation handling so routed_experts is truncated consistently with the combined prompt+completion token window.

Reviewed by Cursor Bugbot for commit b76a6bb. Bugbot is set up for automated code reviews on this repo. Configure here.

@S1ro1 S1ro1 force-pushed the feat/split-routed-experts branch from 277ab6e to 8dbd674 Compare May 11, 2026 22:28
@S1ro1 S1ro1 marked this pull request as ready for review May 11, 2026 22:42
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 8dbd674. Configure here.

completion_logprobs = tokens.completion_logprobs[: max_seq_len - prompt_len]
if routed_experts is not None:
routed_experts = routed_experts[: max_seq_len - prompt_len]
routed_experts = routed_experts[:max_seq_len]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overlong prompt truncation discards prompt routing data

High Severity

The semantics of routed_experts changed from completion-only to prompt+completion combined, but the overlong-prompt truncation path at line 50 still sets routed_experts = []. This discards all prompt routing data when the prompt exceeds max_seq_len. The author correctly updated line 57 from routed_experts[: max_seq_len - prompt_len] to routed_experts[:max_seq_len] for the normal truncation case, but missed the analogous fix here — it needs to be routed_experts[:max_seq_len] instead of [].

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 8dbd674. Configure here.

return None

prompt = _decode_routed_experts(prompt_routed_experts)
assert prompt.shape[0] == prompt_len
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compose function crashes when only completion routing present

Medium Severity

compose_split_routed_experts unconditionally calls _decode_routed_experts(prompt_routed_experts) after the early-return guard. If prompt_routed_experts is None while completion_routed_experts is not None, the early return is skipped (since only both-None triggers it), and _decode_routed_experts(None) crashes with TypeError when calling len(None). The function handles prompt-present/completion-absent but not the reverse.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 8dbd674. Configure here.

choice_any, "routed_experts"
):
prompt_routed_experts = response_any.prompt_routed_experts
completion_routed_experts = choice_any.routed_experts
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or guard accesses both fields when one missing

Medium Severity

All three callers use or to check whether either routing field exists, then unconditionally access both. If only prompt_routed_experts exists on the response but not routed_experts on the choice (or vice versa), the missing attribute access raises AttributeError for OpenAI clients or KeyError for the renderer dict client. Using and or individual getattr/.get() fallbacks would be safer.

Additional Locations (2)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 8dbd674. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant