feat(renderer-client): thread multimodal sidecar through rollout + transport by hallerite · Pull Request #1346 · PrimeIntellect-ai/verifiers

hallerite · 2026-05-11T18:54:33Z

Summary

Threads the renderer's MultiModalData sidecar (pixel_values, placeholder ranges, mm_hashes) from the renderer through /inference/v1/generate to vLLM and onto trajectory tokens — so the new renderer-only multimodal path in prime-rl can drop the orchestrator-side AutoProcessor and image cache entirely.

Companion PRs:

renderers: PrimeIntellect-ai/renderers#17
prime-rl: PrimeIntellect-ai/prime-rl#2473

What changes

Rollout (renderer_client.py)

_get_incremental_prompt_ids now returns RenderedTokens | None (token_ids + multi_modal_data); text-only renderers' raw list[int] is normalized via as_rendered_tokens so callers unpack uniformly
_step_multi_modal_data(step) recovers the prior turn's mm_data from the trajectory step and forwards it to bridge_to_next_turn so the new turn's placeholder runs cover every earlier-turn image
RendererClient.create_completion unpacks the bridged result into (prompt_ids, multi_modal_data) and forwards both to generate
parse_response_tokens copies response.multi_modal_data onto ResponseTokens
Pool dispatch: is_multimodal(renderer) (cached bool) replaces the runtime-checkable isinstance walk on the hot path; pool methods are called directly (the pool implements the protocol structurally) and offloaded via asyncio.to_thread when the renderer is a pool

Types (types.py)

ResponseTokens.multi_modal_data: Any | None
TrajectoryStepTokens.multi_modal_data: NotRequired[Any]

Typed as Any to avoid a hard import dep on renderers.

Transport (utils/serve_utils.py)

Custom msgpack encoder gains torch tensor / numpy ndarray / dataclass support; tensors encode as {__torch_tensor__: True, dtype, shape, data} with raw bytes payload (torch imported lazily — text-only consumers don't pay for it)
decode_tensor_payload / walk_decode_tensors rehydrate on the receiving side

Save (utils/save_utils.py)

is_json_serializable keeps its honest JSON-only contract; state_to_output bypasses the gate for the trajectory column (msgpack-transported)
_strip_intermediate_mm_data drops tokens.multi_modal_data from all but the last trajectory step before transport — bridge merges prior turns' mm_data into each new turn, so naively shipping is O(N²) bytes per N-turn rollout

Test plan

Text-only RL: no mm_data → fast path unchanged
Multimodal RL: mm_data reaches vLLM via multi_modal_data and the trainer via TrajectoryStepTokens["multi_modal_data"]
Bridge: previous-turn images carried forward, placeholder count matches the combined token sequence
Transport: msgpack round-trip preserves tensor shapes / dtypes
JSONL save: trajectories with multi_modal_data accepted, no spurious "not JSON-serializable"; no O(N²) duplication

…ansport Surfaces the renderer's MultiModalData sidecar (pixel_values, placeholder ranges, mm_hashes) end-to-end so multimodal renderers can drive vLLM's /inference/v1/generate `multi_modal_data` features field and the downstream trainer's `mm_kwargs` without going through the legacy chat-completions / MITO multimodal path. renderer_client.py - `_step_multi_modal_data(step)`: recover the prior turn's mm_data from the trajectory step (parsed-tokens or raw-message side). - `_get_incremental_prompt_ids` now returns `RenderedTokens | None` and forwards `previous_multi_modal_data` to `bridge_to_next_turn` so the new turn's placeholder runs cover every earlier-turn image. Without this carry-forward, vLLM sees mismatched placeholder counts and falls back to hash-cache lookup or errors. Text-only renderers' raw `list[int]` returns are normalized via `as_rendered_tokens`. - `RendererClient.create_completion` unpacks the bridged result into `(prompt_ids, multi_modal_data)` and forwards both to `generate`. - `parse_response_tokens`: copies `response.multi_modal_data` onto the emitted `ResponseTokens` so downstream consumers can read it. types.py - `ResponseTokens.multi_modal_data: Any | None` - `TrajectoryStepTokens.multi_modal_data: NotRequired[Any]` Both typed as `Any` to avoid a hard import dependency on `renderers`. utils/response_utils.py - `parse_response_tokens` propagates `multi_modal_data` onto the `TrajectoryStepTokens` output when present. utils/save_utils.py - `is_json_serializable` accepts torch tensors / numpy arrays / renderer sidecar dataclasses — these aren't JSON-native but survive the prime-rl msgpack encoder, and trajectories carrying them are excluded from the JSONL save at the orchestrator boundary (orchestrator passes `exclude_keys={"trajectory"}` to `save_rollouts`). - `_strip_intermediate_mm_data(trajectory)`: drop `tokens.multi_modal_data` from all but the last step before transport. `bridge_to_next_turn` merges prior turns' mm_data into the new turn, so naively shipping mm_data on every step duplicates every image O(N²) bytes for an N-turn rollout; only the last step's sidecar is read by the trainer. utils/serve_utils.py - Custom msgpack encoder gains torch tensor / numpy ndarray / dataclass support. Tensors are encoded as `{__torch_tensor__: True, dtype, shape, data}` with raw bytes payload. Torch is imported lazily so text-only consumers don't pay for it. - `decode_tensor_payload` / `walk_decode_tensors` rehydrate tensor payloads on the receiving side. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adopts the renderers split-Protocol surface (MultimodalRenderer + uniform ``RenderedTokens | None`` bridge return) and resolves CI: - ``renderer_client.py``: drop ``as_rendered_tokens`` (no longer needed since bridge returns ``RenderedTokens | None`` uniformly). ``_get_incremental_prompt_ids`` dispatches on ``isinstance(renderer, MultimodalRenderer)`` — multimodal path passes ``previous_multi_modal_data`` so prior-turn images carry forward into ``mm_placeholders``; text-only path uses the base ``Renderer.bridge`` signature unchanged. The previous always-pass design relied on every text-only renderer accepting and ignoring the kwarg, which spread the multimodal contract across the whole renderer registry. - ``save_utils.py``: cast iteration variable to ``Mapping[str, Any]`` in ``_strip_intermediate_mm_data`` so ty narrows ``step.get("tokens")`` correctly (previously ty inferred ``_KT`` as ``Never`` after the non-Mapping branch was excluded). - ``serve_utils.py``: replace ``import torch`` with ``importlib.import_module("torch")`` inside ``decode_tensor_payload`` — torch is a soft runtime dep here (callers that pass ``to_torch=True`` are expected to have it installed). Static type checkers in downstream consumers without torch installed don't fail on unresolved-import anymore. - ``pyproject.toml`` + ``uv.lock``: pin renderers to feat/multimodal-vlm branch until that PR lands and a new PyPI release is published. The branch provides ``MultimodalRenderer``, ``MultiModalData``, ``PlaceholderRange``, ``RenderedTokens``, and the ``previous_multi_modal_data`` kwarg this branch consumes. - ``tests/test_renderer_client.py``: ``_BridgeRenderer`` stub now returns ``RenderedTokens`` (matches the new Protocol). Update list equality / slice assertions to use ``result.token_ids`` since the uniform bridge return shape is ``RenderedTokens | None``. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…irection renderer_client (bugbot #3, high): the previous isinstance(renderer, MultimodalRenderer) check was performed against the outer renderer parameter, which in production is a RendererPool. RendererPool was not a Renderer subclass, so the multimodal branch never fired and the PR's mm carry-forward was silently broken under pooled use. Renderers PR now has RendererPool implement the Renderer protocol structurally, and this side dispatches via the cached is_multimodal(r) helper which works on either a bare renderer or a pool. save_utils (bugbot #2, medium): is_json_serializable previously whitelisted torch tensors and renderer dataclasses, but make_serializable has no handler for them — it would stringify to "tensor(...)" garbage if anything actually hit JSON. The whitelist worked only because the orchestrator excludes "trajectory" at the JSONL boundary. Restore the honest JSON-only contract and bypass the gate explicitly in state_to_output for col == "trajectory" (where msgpack handles tensors via its custom encoder). save_utils (bugbot #4, low): _strip_intermediate_mm_data was stripping step["tokens"]["multi_modal_data"] but not the duplicate at step["response"].message.tokens.multi_modal_data. The Pydantic Response serialization preserves it through msgpack via model_dump(), so the O(N²) bloat the function targets was only halved. Now strips both. Also drop the pool-vs-bare-renderer branching ladder via _maybe_offload (asyncio.to_thread iff pool); pool's checkout is now an implementation detail of the pool itself. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 397b8aa. Configure here.}

willccbb · 2026-05-12T02:59:07Z

+# MultiModalData, PlaceholderRange, as_rendered_tokens, and the
+# `previous_multi_modal_data` kwarg on Renderer.bridge_to_next_turn
+# that this branch consumes. Drop after merge + release.
+renderers = { git = "https://github.com/PrimeIntellect-ai/renderers.git", branch = "feat/multimodal-vlm" }


Merged the renderers PR, can we release renderers + replace this? Should never have a git source pinned in verifiers, PyPI won't respect it + we always want vf to be releasable

eligotts · 2026-05-12T06:53:50Z

+            # earlier steps before transport. Bridge accesses mm_data only
+            # within the env-worker rollout loop, which has already
+            # finished by the time state_to_output runs.
+            value = _strip_intermediate_mm_data(value)


this assumes pure extension right? how do we handle branching/compaction rollouts?

hallerite mentioned this pull request May 11, 2026

feat: renderer-only multimodal path — rip MITO branch, pack pixel_values from renderer PrimeIntellect-ai/prime-rl#2473

Open

5 tasks

cursor Bot reviewed May 11, 2026

View reviewed changes

Comment thread verifiers/clients/renderer_client.py

Comment thread verifiers/utils/save_utils.py Outdated

cursor Bot reviewed May 11, 2026

View reviewed changes

Comment thread verifiers/clients/renderer_client.py

Comment thread verifiers/utils/save_utils.py

hallerite and others added 2 commits May 12, 2026 00:02

chore: bump renderers to pick up isinstance dispatch fix

397b8aa

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cursor Bot reviewed May 12, 2026

View reviewed changes

Comment thread verifiers/clients/renderer_client.py

hallerite self-assigned this May 12, 2026

hallerite requested review from eligotts and willccbb May 12, 2026 01:09

willccbb requested changes May 12, 2026

View reviewed changes

eligotts reviewed May 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(renderer-client): thread multimodal sidecar through rollout + transport#1346

feat(renderer-client): thread multimodal sidecar through rollout + transport#1346
hallerite wants to merge 4 commits into
mainfrom
feat/renderer-multimodal-passthrough

hallerite commented May 11, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

willccbb May 12, 2026

Uh oh!

eligotts May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

hallerite commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changes

Test plan

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

willccbb May 12, 2026

Choose a reason for hiding this comment

Uh oh!

eligotts May 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hallerite commented May 11, 2026 •

edited

Loading