Compose split routed experts from vLLM responses#1349
Conversation
303d88b to
277ab6e
Compare
277ab6e to
8dbd674
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 8dbd674. Configure here.
| completion_logprobs = tokens.completion_logprobs[: max_seq_len - prompt_len] | ||
| if routed_experts is not None: | ||
| routed_experts = routed_experts[: max_seq_len - prompt_len] | ||
| routed_experts = routed_experts[:max_seq_len] |
There was a problem hiding this comment.
Overlong prompt truncation discards prompt routing data
High Severity
The semantics of routed_experts changed from completion-only to prompt+completion combined, but the overlong-prompt truncation path at line 50 still sets routed_experts = []. This discards all prompt routing data when the prompt exceeds max_seq_len. The author correctly updated line 57 from routed_experts[: max_seq_len - prompt_len] to routed_experts[:max_seq_len] for the normal truncation case, but missed the analogous fix here — it needs to be routed_experts[:max_seq_len] instead of [].
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 8dbd674. Configure here.
| return None | ||
|
|
||
| prompt = _decode_routed_experts(prompt_routed_experts) | ||
| assert prompt.shape[0] == prompt_len |
There was a problem hiding this comment.
Compose function crashes when only completion routing present
Medium Severity
compose_split_routed_experts unconditionally calls _decode_routed_experts(prompt_routed_experts) after the early-return guard. If prompt_routed_experts is None while completion_routed_experts is not None, the early return is skipped (since only both-None triggers it), and _decode_routed_experts(None) crashes with TypeError when calling len(None). The function handles prompt-present/completion-absent but not the reverse.
Reviewed by Cursor Bugbot for commit 8dbd674. Configure here.
| choice_any, "routed_experts" | ||
| ): | ||
| prompt_routed_experts = response_any.prompt_routed_experts | ||
| completion_routed_experts = choice_any.routed_experts |
There was a problem hiding this comment.
or guard accesses both fields when one missing
Medium Severity
All three callers use or to check whether either routing field exists, then unconditionally access both. If only prompt_routed_experts exists on the response but not routed_experts on the choice (or vice versa), the missing attribute access raises AttributeError for OpenAI clients or KeyError for the renderer dict client. Using and or individual getattr/.get() fallbacks would be safer.
Additional Locations (2)
Reviewed by Cursor Bugbot for commit 8dbd674. Configure here.


Summary
vllm_xargs.routed_experts_encoding = "base64"Validation
Note
Medium Risk
Updates token parsing across multiple clients to new vLLM
prompt_routed_experts/routed_expertsshapes, which can affect downstream telemetry/analysis if alignment or decoding assumptions are wrong.Overview
Adds
compose_split_routed_experts()to decode and merge vLLM’s split routing outputs (prompt_routed_experts+ per-choicerouted_experts) into a singleResponseTokens.routed_expertsarray aligned to prompt+completion tokens (including padding the final completion token).Updates the chat, completions, and renderer clients to consume the new split fields and removes the prior inline base85/
np.frombufferdecode path. Fixes truncation handling sorouted_expertsis truncated consistently with the combined prompt+completion token window.Reviewed by Cursor Bugbot for commit b76a6bb. Bugbot is set up for automated code reviews on this repo. Configure here.