Skip to content

Feat/tito dynamo transport#1289

Open
biswapanda wants to merge 4 commits intoPrimeIntellect-ai:mainfrom
biswapanda:feat/tito-dynamo-transport
Open

Feat/tito dynamo transport#1289
biswapanda wants to merge 4 commits intoPrimeIntellect-ai:mainfrom
biswapanda:feat/tito-dynamo-transport

Conversation

@biswapanda
Copy link
Copy Markdown

@biswapanda biswapanda commented May 5, 2026

Description

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Test improvement

Testing

  • All existing tests pass when running uv run pytest locally.
  • New tests have been added to cover the changes

Checklist

  • My code follows the style guidelines of this project as outlined in AGENTS.md
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Additional Notes


Note

Medium Risk
Adds an alternate request/response wire format and local tokenization path that changes how prompts/completions are sent and parsed when renderer_transport is enabled, which could affect inference correctness and tool-call handling. Default behavior remains unchanged but misconfiguration or backend mismatches could cause failures.

Overview
Adds a new configurable RendererTransport (prime_vllm_generate vs dynamo_chat_nvext) and threads it through ClientConfig, RendererClient, and OpenAIChatCompletionsTokenClient so deployments can target Dynamo’s /chat/completions nvext.token_data token-in/token-out interface.

Updates renderers.client.completions_request to build/POST the Dynamo nvext request shape, parse completion token ids/logprobs from Dynamo-style responses, and normalize finish_reason to tool_calls when tool calls are detected client-side.

Extends TITO (OpenAIChatCompletionsTokenClient) to support the Dynamo transport by posting stitched prompt_ids to /chat/completions, performing bridge tokenization locally via a cached renderer/tokenizer (no /tokenize round-trip), and making prefix matching more robust by dropping None-valued keys during message normalization. Adds targeted tests covering the new transport and local tokenization behavior.

Reviewed by Cursor Bugbot for commit e8dc4d2. Bugbot is set up for automated code reviews on this repo. Configure here.

AmeenP and others added 3 commits May 5, 2026 09:33
…okenClient

    The verifiers TITO client previously only spoke vLLM's TITO surface
    (/v1/chat/completions/tokens for the final POST, /tokenize for bridge
    tokenization). Dynamo bis/dynamo-rl serves neither route, so multi-turn
    TITO against Dynamo silently degraded to MITO every turn-2+ via the
    existing fallback path.

    This commit teaches the TITO client to read ClientConfig.renderer_transport
    (same field RendererClient consults) and route accordingly:

    - prime_vllm_generate (default): unchanged - posts to
      /v1/chat/completions/tokens and uses /tokenize over HTTP.

    - dynamo_chat_nvext: bridge tokenize runs locally via the renderers
      package (zero RTTs); final POST goes to /v1/chat/completions with
      placeholder messages + nvext.token_data carrying the stitched
      prompt_ids + explicit stop_token_ids from renderer.get_stop_token_ids().
      Wire shape matches what RendererClient already produces for the
      same transport, so a Dynamo deployment validated against renderer
      mode automatically accepts TITO traffic too.

    Adds two unit tests that assert the dynamo-transport wire shape end-to-
    end via a recording client + stub renderer (no real tokenizer download).
value = sampling_args.get(key, extra_body.get(key))
if value is not None:
body[key] = value
path = "/chat/completions"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dynamo transport silently drops cache_salt in renderer path

High Severity

The dynamo_chat_nvext transport path rebuilds the request body, silently dropping cache_salt (and other sampling arguments like prompt_logprobs) from sampling_args or extra_body. This prevents proper prefix-cache KV invalidation after policy updates, causing the engine to reuse stale KV and leading to training regression.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit c1ecc7a. Configure here.

Comment thread verifiers/types.py
renderer: str = "auto"
renderer_model_name: str | None = None
renderer_pool_size: int | None = None
renderer_transport: RendererTransport = "prime_vllm_generate"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New renderer_transport config field missing from docs

Low Severity

The PR adds renderer_transport to ClientConfig, which is a user-facing configuration class documented in docs/reference.md. The ClientConfig section in that file does not reflect the new field. Per project rules, any PR that modifies core user-facing functionality described in docs/ must update the relevant documentation.

Fix in Cursor Fix in Web

Triggered by project rule: BugBot Instructions

Reviewed by Cursor Bugbot for commit c1ecc7a. Configure here.

for k, v in extra_body.items()
if k not in promotable and v is not None and k not in body
}
body.update(passthrough)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

priority in extra_body leaks through passthrough

Low Severity

When priority is supplied via extra_body, it gets placed in nvext["agent_hints"]["priority"] (line 248-250) and leaks through the passthrough dict as a top-level body["priority"] (line 286-291), because "priority" is not in promotable and not a top-level key in body. The renderer client's completions_request doesn't have passthrough logic, so priority only ends up in nvext there — making the two transports inconsistent in wire shape.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit c1ecc7a. Configure here.

OpenAIChatCompletionsTokenClient.get_prompt_ids' prefix-match between
the prompt_messages caller-input and the trajectory step messages was
asymmetric:

- prompt_messages went straight through normalize_for_comparison (which
  picks up vf.AssistantMessage.model_dump's exhaustive view, including
  thinking_blocks=None and other defaulted fields).
- step_messages went through to_native_prompt FIRST, which produces the
  slimmer OpenAI-format dict that omits thinking_blocks entirely.

The two normalized forms then never compared equal whenever the caller
handed the client Pydantic vf.Message types -- the form MultiTurnEnv
produces after maybe_normalize_messages -- so the prefix match always
returned None and TITO silently fell back to MITO every turn-2+.
Probe-3 and the upstream test suite both used raw dict input, so the
asymmetry only showed up under real orchestrator rollouts.

Fix: drop None-valued keys in normalize_for_comparison. Both sides
land on the same shape regardless of whether they came in as Pydantic
or as plain OpenAI dicts.

Validated end-to-end against bis-dev/5/always-continue-tito (multi-turn
TITO + Dynamo bis/dynamo-rl smoke):
  348 /v1/chat/completions, 21 SIDECAR-SKIP-TOKENIZE markers, 0 fall-
  back warnings. Same SIDECAR token-prefix appears across turns,
  confirming the engine reuses prior-turn ids verbatim.

The existing 6 unit tests (4 vanilla TITO + 2 dynamo_chat_nvext) all
still pass; their dict-shaped input always normalized to the same
shape on both sides, so the symmetric drop-None doesn't change them.
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 4 total unresolved issues (including 3 from previous reviews).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit e8dc4d2. Configure here.

float(item.get("logprob") or 0.0)
for item in (choice.get("logprobs") or {}).get("content") or []
if isinstance(item, dict)
]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dynamo logprobs filter breaks token-logprob length alignment

Medium Severity

The dynamo_chat_nvext logprobs parsing uses if isinstance(item, dict) as a filter in the list comprehension, which silently drops non-dict entries (e.g., None). This can make completion_logprobs shorter than completion_ids, breaking the expected 1:1 alignment. The prime_vllm_generate path correctly preserves alignment by substituting 0.0 for None entries instead of filtering them out. Misaligned logprobs can corrupt training signal computation downstream.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit e8dc4d2. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants