Feat/tito dynamo transport#1289
Conversation
…okenClient
The verifiers TITO client previously only spoke vLLM's TITO surface
(/v1/chat/completions/tokens for the final POST, /tokenize for bridge
tokenization). Dynamo bis/dynamo-rl serves neither route, so multi-turn
TITO against Dynamo silently degraded to MITO every turn-2+ via the
existing fallback path.
This commit teaches the TITO client to read ClientConfig.renderer_transport
(same field RendererClient consults) and route accordingly:
- prime_vllm_generate (default): unchanged - posts to
/v1/chat/completions/tokens and uses /tokenize over HTTP.
- dynamo_chat_nvext: bridge tokenize runs locally via the renderers
package (zero RTTs); final POST goes to /v1/chat/completions with
placeholder messages + nvext.token_data carrying the stitched
prompt_ids + explicit stop_token_ids from renderer.get_stop_token_ids().
Wire shape matches what RendererClient already produces for the
same transport, so a Dynamo deployment validated against renderer
mode automatically accepts TITO traffic too.
Adds two unit tests that assert the dynamo-transport wire shape end-to-
end via a recording client + stub renderer (no real tokenizer download).
| value = sampling_args.get(key, extra_body.get(key)) | ||
| if value is not None: | ||
| body[key] = value | ||
| path = "/chat/completions" |
There was a problem hiding this comment.
Dynamo transport silently drops cache_salt in renderer path
High Severity
The dynamo_chat_nvext transport path rebuilds the request body, silently dropping cache_salt (and other sampling arguments like prompt_logprobs) from sampling_args or extra_body. This prevents proper prefix-cache KV invalidation after policy updates, causing the engine to reuse stale KV and leading to training regression.
Reviewed by Cursor Bugbot for commit c1ecc7a. Configure here.
| renderer: str = "auto" | ||
| renderer_model_name: str | None = None | ||
| renderer_pool_size: int | None = None | ||
| renderer_transport: RendererTransport = "prime_vllm_generate" |
There was a problem hiding this comment.
New renderer_transport config field missing from docs
Low Severity
The PR adds renderer_transport to ClientConfig, which is a user-facing configuration class documented in docs/reference.md. The ClientConfig section in that file does not reflect the new field. Per project rules, any PR that modifies core user-facing functionality described in docs/ must update the relevant documentation.
Triggered by project rule: BugBot Instructions
Reviewed by Cursor Bugbot for commit c1ecc7a. Configure here.
| for k, v in extra_body.items() | ||
| if k not in promotable and v is not None and k not in body | ||
| } | ||
| body.update(passthrough) |
There was a problem hiding this comment.
priority in extra_body leaks through passthrough
Low Severity
When priority is supplied via extra_body, it gets placed in nvext["agent_hints"]["priority"] (line 248-250) and leaks through the passthrough dict as a top-level body["priority"] (line 286-291), because "priority" is not in promotable and not a top-level key in body. The renderer client's completions_request doesn't have passthrough logic, so priority only ends up in nvext there — making the two transports inconsistent in wire shape.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit c1ecc7a. Configure here.
OpenAIChatCompletionsTokenClient.get_prompt_ids' prefix-match between the prompt_messages caller-input and the trajectory step messages was asymmetric: - prompt_messages went straight through normalize_for_comparison (which picks up vf.AssistantMessage.model_dump's exhaustive view, including thinking_blocks=None and other defaulted fields). - step_messages went through to_native_prompt FIRST, which produces the slimmer OpenAI-format dict that omits thinking_blocks entirely. The two normalized forms then never compared equal whenever the caller handed the client Pydantic vf.Message types -- the form MultiTurnEnv produces after maybe_normalize_messages -- so the prefix match always returned None and TITO silently fell back to MITO every turn-2+. Probe-3 and the upstream test suite both used raw dict input, so the asymmetry only showed up under real orchestrator rollouts. Fix: drop None-valued keys in normalize_for_comparison. Both sides land on the same shape regardless of whether they came in as Pydantic or as plain OpenAI dicts. Validated end-to-end against bis-dev/5/always-continue-tito (multi-turn TITO + Dynamo bis/dynamo-rl smoke): 348 /v1/chat/completions, 21 SIDECAR-SKIP-TOKENIZE markers, 0 fall- back warnings. Same SIDECAR token-prefix appears across turns, confirming the engine reuses prior-turn ids verbatim. The existing 6 unit tests (4 vanilla TITO + 2 dynamo_chat_nvext) all still pass; their dict-shaped input always normalized to the same shape on both sides, so the symmetric drop-None doesn't change them.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 4 total unresolved issues (including 3 from previous reviews).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit e8dc4d2. Configure here.
| float(item.get("logprob") or 0.0) | ||
| for item in (choice.get("logprobs") or {}).get("content") or [] | ||
| if isinstance(item, dict) | ||
| ] |
There was a problem hiding this comment.
Dynamo logprobs filter breaks token-logprob length alignment
Medium Severity
The dynamo_chat_nvext logprobs parsing uses if isinstance(item, dict) as a filter in the list comprehension, which silently drops non-dict entries (e.g., None). This can make completion_logprobs shorter than completion_ids, breaking the expected 1:1 alignment. The prime_vllm_generate path correctly preserves alignment by substituting 0.0 for None entries instead of filtering them out. Misaligned logprobs can corrupt training signal computation downstream.
Reviewed by Cursor Bugbot for commit e8dc4d2. Configure here.


Description
Type of Change
Testing
uv run pytestlocally.Checklist
Additional Notes
Note
Medium Risk
Adds an alternate request/response wire format and local tokenization path that changes how prompts/completions are sent and parsed when
renderer_transportis enabled, which could affect inference correctness and tool-call handling. Default behavior remains unchanged but misconfiguration or backend mismatches could cause failures.Overview
Adds a new configurable
RendererTransport(prime_vllm_generatevsdynamo_chat_nvext) and threads it throughClientConfig,RendererClient, andOpenAIChatCompletionsTokenClientso deployments can target Dynamo’s/chat/completionsnvext.token_datatoken-in/token-out interface.Updates
renderers.client.completions_requestto build/POST the Dynamonvextrequest shape, parse completion token ids/logprobs from Dynamo-style responses, and normalizefinish_reasontotool_callswhen tool calls are detected client-side.Extends TITO (
OpenAIChatCompletionsTokenClient) to support the Dynamo transport by posting stitchedprompt_idsto/chat/completions, performing bridge tokenization locally via a cached renderer/tokenizer (no/tokenizeround-trip), and making prefix matching more robust by droppingNone-valued keys during message normalization. Adds targeted tests covering the new transport and local tokenization behavior.Reviewed by Cursor Bugbot for commit e8dc4d2. Bugbot is set up for automated code reviews on this repo. Configure here.