Skip to content

Forward-merge release/1.7 into develop#1983

Merged
GPUtester merged 1 commit into
developfrom
release/1.7
May 20, 2026
Merged

Forward-merge release/1.7 into develop#1983
GPUtester merged 1 commit into
developfrom
release/1.7

Conversation

@rapids-bot
Copy link
Copy Markdown

@rapids-bot rapids-bot Bot commented May 20, 2026

Forward-merge triggered by push to release/1.7 that creates a PR to keep develop up-to-date. If this PR is unable to be immediately merged due to conflicts, it will remain open for the team to manually merge. See forward-merger docs for more info.

…ll (#1980)

Restores the wire format that `/generate/full` emits for the workflow's output, which silently regressed between 1.6 and 1.7.

**The contract** (documented by the eval client `nat.plugins.eval.runtime.remote_workflow.py:79` and pinned by `test_remote_evaluate.py`'s server fixture):

```
data: {"value": "<final answer>"}
```

**What broke.** PR #1851 (`feat: token streaming support for ReAct Agent`) added a `_stream_fn` to `react_agent` that yields `ChatResponseChunk` (OpenAI shape). Once a stream_fn is registered, `generate_streaming_response_full` takes the streaming branch and wraps each chunk in `ResponsePayloadOutput`, whose `get_stream_data()` dumps the chunk's full OpenAI envelope. There is no top-level `value` field, so the eval client's `chunk_data.get("value")` returns `None` and every eval scores 0. The producer (`react_agent`) and consumer (eval client) ship in the same NAT release and disagree on the wire shape.

`tool_calling_agent` is exposed to the same regression for the same reason — both yield `ChatResponseChunk` from their `_stream_fn`. The fix is in the shared `ResponsePayloadOutput.get_stream_data()`, so both code paths get covered.

**The fix.** `ResponsePayloadOutput.get_stream_data()` now normalizes any payload — string, primitive, `ChatResponseChunk`, `ChatResponse`, other `BaseModel` — into the canonical `data: {"value": "<str>"}\n\n` envelope. Scoped to `/generate/full`; `/v1/chat/completions` is unaffected because that path yields `ChatResponseChunk` directly through `ResponseBaseModelOutput.get_stream_data()`, never wrapped in `ResponsePayloadOutput`. WebSocket consumers do their own payload coercion in `MessageValidator.convert_data_to_message_content()` and don't call `get_stream_data()` either.

**Tests.** Parametrized unit tests pin the wire format per payload type. A new integration test in `test_remote_evaluate.py` round-trips real `ResponsePayloadOutput` lines through the real `EvaluationRemoteWorkflowHandler`, so a future change that desynchronizes producer and consumer fails CI rather than silently scoring zero on every eval.

## How to verify

The bug surface is two pure functions on NAT data models — the producer (`ResponsePayloadOutput.get_stream_data`) and the consumer (`chunk_data.get("value")` in the eval client). You can reproduce both the bug and the fix with no FastAPI server, no LLM, and no external services:

```python
import json
from nat.data_models.api_server import ResponsePayloadOutput, ChatResponseChunk

# What react_agent's _stream_fn yields after PR #1851:
chunk = ChatResponseChunk.create_streaming_chunk("21")

# What /generate/full puts on the wire:
sse_line = ResponsePayloadOutput(payload=chunk).get_stream_data()

# What nat.plugins.eval.runtime.remote_workflow extracts:
data = json.loads(sse_line[len("data: "):-2])
print("eval client extracts:", repr(data.get("value")))
```

Expected output:

| | Output |
|---|---|
| On `release/1.7` (without this PR) | `eval client extracts: None` |
| With this PR applied | `eval client extracts: '21'` |

The full `nvidia_nat_core` and `nvidia_nat_eval` test suites pass on this branch with no regressions, including the new parametrized unit tests and the producer/consumer integration test added here.

## By Submitting this PR I confirm:
- I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing/index.md).
- We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
  - Any contribution which contains commits that are not Signed-Off will not be accepted.
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.



## Summary by CodeRabbit

* **New Features**
  * Standardized the /generate/full SSE output to always emit responses as a consistent JSON "value" envelope for all payload types.

* **Bug Fixes**
  * Remote evaluation now correctly accumulates streamed token/value segments into the final output instead of only capturing a single chunk.

* **Tests**
  * Added unit and integration tests verifying the SSE envelope format and correct reconstruction of streamed responses.



[![Review Change Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/NVIDIA/NeMo-Agent-Toolkit/pull/1980?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack)

Authors:
  - Matthew Grossman (https://github.com/matthewgrossman)

Approvers:
  - Will Killian (https://github.com/willkill07)

URL: #1980
@rapids-bot rapids-bot Bot requested a review from a team as a code owner May 20, 2026 23:13
@GPUtester GPUtester merged commit 561276b into develop May 20, 2026
1 check passed
@rapids-bot
Copy link
Copy Markdown
Author

rapids-bot Bot commented May 20, 2026

SUCCESS - forward-merge complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants