[RELEASE] NeMo-Agent-Toolkit v1.7.0#1984
Conversation
Signed-off-by: Anuradha Karuppiah <26330987+AnuradhaKaruppiah@users.noreply.github.com>
Forward-merge triggered by push to release/1.6 that creates a PR to keep develop up-to-date. If this PR is unable to be immediately merged due to conflicts, it will remain open for the team to manually merge. See [forward-merger docs](https://docs.rapids.ai/maintainers/forward-merger/) for more info.
…1856) ## Summary - Fixes a race condition in \`MemMachineEditor.add_items\` that caused non-deterministic ordering of conversation messages and flaky CI failures on Python 3.13 ## Root Cause \`add_items\` wrapped each conversation message in \`asyncio.to_thread(add_memory)\` and dispatched all tasks concurrently via \`asyncio.gather(*tasks)\`. Thread pool tasks complete in nondeterministic order, so the API spy recorded calls by completion order rather than insertion order — causing \`test_conversation_messages_preserved_in_order\` to flip assertions under CI load. ## Fix Refactored \`add_items\` to introduce an inner \`add_item\` coroutine per \`MemoryItem\`. Within that coroutine, conversation messages are \`await\`ed sequentially via \`asyncio.to_thread\`, preserving chronological order. Multiple \`MemoryItem\`s are still dispatched concurrently via \`asyncio.gather\` — so there is no performance regression on batch inserts. \`\`\`python async def add_item(memory_item: MemoryItem) -> None: ... for msg in conversation: await asyncio.to_thread(add_memory) # sequential within one item if items: await asyncio.gather(*(add_item(item) for item in items)) # concurrent across items \`\`\` The index-based assertions in the test are preserved as-is — they are now correct because the implementation guarantees order. ## Test plan - [x] \`test_conversation_messages_preserved_in_order\` passes 50 consecutive runs on Python 3.11, 3.12, and 3.13 locally - [x] Full test suite: 37 passed, 6 skipped (integration) on all three Python versions Closes #1855 ## Summary by CodeRabbit * **Refactor** * Updated memory item upload processing to preserve the sequential order of conversation messages while maintaining concurrent processing of multiple memory items. Authors: - Federico Kamelhar (https://github.com/fede-kamel) Approvers: - Will Killian (https://github.com/willkill07) URL: #1856
Forward-merge triggered by push to release/1.6 that creates a PR to keep develop up-to-date. If this PR is unable to be immediately merged due to conflicts, it will remain open for the team to manually merge. See [forward-merger docs](https://docs.rapids.ai/maintainers/forward-merger/) for more info.
Forward-merge triggered by push to release/1.6 that creates a PR to keep develop up-to-date. If this PR is unable to be immediately merged due to conflicts, it will remain open for the team to manually merge. See [forward-merger docs](https://docs.rapids.ai/maintainers/forward-merger/) for more info.
- Relax openai version constraint from ~=1.106 (>=1.106, <2.0) to ~=2.0 (>=2.0, <3.0) across all packages that directly pin it. - Remove explicit openai dep in favor of transitive resolution for agno and related packages - Bump openpipe-art from ==0.5.4 to ~=0.5.17 for openai 2.x compatibility, and remove the torchtune_args field which was dropped in openpipe-art 0.5.9. - Regenerate affected uv.lock files (root, agno, openpipe-art, agno_personal_finance, multi_frameworks, rl_with_openpipe_art, agents). Output of `./ci/scripts/license_diff.py` ``` Added packages: - polars 1.39.3 — MIT - polars-runtime-32 1.39.3 — MIT - setproctitle 1.3.7 — BSD-3-Clause Removed packages: - litellm 1.74.1 (duplicate version removed, 1.74.9 remains) - mem0ai 0.1.118 (duplicate version removed, 0.1.115 remains) Changed packages: - instructor 1.12.0 → 1.15.1 - openai 1.109.1 → 2.30.0 - openpipe-art 0.5.4 → 0.5.17 - strands-agents 1.27.0 → 1.34.1 ``` Closes ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing/index.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. ## Summary by CodeRabbit * **Chores** * Simplified dependency management by using package extras instead of direct pins and removed redundant explicit pins. * Updated openpipe-art to a more flexible compatible release range. * **Refactor** * Removed an unused configuration field from the model training configuration. * **Tests** * Adjusted tests to reflect the removed configuration field. Authors: - Will Killian (https://github.com/willkill07) Approvers: - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) - https://github.com/Salonijain27 URL: #1849
* Use commit sha for versioning GitHub Actions instead of version tags * Remove unused environment variables and functions * Remove dependency on Rapids GHA Tools, we were only using `rapids-logger` ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing/index.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. ## Summary by CodeRabbit ## Release Notes * **Chores** * Pinned external GitHub Actions to specific commit SHAs across workflows for improved stability and reproducibility * Removed unused CI environment variables (`GH_TOKEN`, `RAPIDS_CONDA_RETRY_MAX`) * Simplified CI script logging mechanisms throughout the pipeline * Removed deprecated helper functions from CI infrastructure Authors: - David Gardner (https://github.com/dagardner-nv) Approvers: - Will Killian (https://github.com/willkill07) URL: #1866
* Don't install jfrog on the fly, instead use a container that already has it. * Install a specific version of `slack-sdk` validated with a sha. ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing/index.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. ## Summary by CodeRabbit * **Chores** * Updated CI/CD infrastructure with containerized dependency management for improved reproducibility and reliability * Pinned Slack SDK version for consistent builds Authors: - David Gardner (https://github.com/dagardner-nv) Approvers: - Will Killian (https://github.com/willkill07) URL: #1867
* Remote MCP servers may expire registered auth clients, when this happens the NAT MCP client will need to re-register client credentials. * This adds a new `oauth_client_ttl` configuration attribute to `MCPOAuth2ProviderConfig` * This incorporates changes from PR #1871 ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing/index.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. ## Summary by CodeRabbit * **New Features** * Configurable oauth_client_ttl for OAuth2 credential caching (default 270s; 0 disables caching). * **Bug Fixes** * Improved authentication robustness: serialized discovery/registration to avoid races, automatic re-registration on failures, safer handling when credentials or endpoints are missing/expired, and retry logic for registration rejections. * **Documentation** * Documented oauth_client_ttl behavior, defaults, and TTL guidance. * **Tests** * Added tests for credential TTL, cache expiry, and related authentication flows. Authors: - David Gardner (https://github.com/dagardner-nv) - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) Approvers: - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) URL: #1872
* This notebook was installing the `nvidia-nat-profiling` package which was dropped in v1.5, causing the notebook to install nat v1.3 * Replace the model with a nano model to avoid being rate limited during the eval steps. * Update `migration-guide.md` to fix profiler installation instructions. * Replace broken documentation links (unrelated link check errors found in CI) ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing/index.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. ## Summary by CodeRabbit * **Documentation** * Updated NeMo Customizer links in the finetuning guide; revised packaging/install guidance to replace the profiling extra with profiler and document the eval/profiler split. * **Examples** * Updated Microservices setup link and cleaned README whitespace/formatting. * **Notebooks** * Switched profiling extra name to profiler and added eval where applicable; updated generated model/config defaults (model selection and token limits) and bumped notebook kernel/Python metadata. Authors: - David Gardner (https://github.com/dagardner-nv) Approvers: - https://github.com/mnajafian-nv - Bryan Bednarski (https://github.com/bbednarski9) URL: #1874
…1869) Implement `ATIFTrajectoryExporter` to visualize the complete ATIF trajectory. This exporter/viewer is for debugging purpose only. - Add `ATIFTrajectorySpanExporter` that converts complete ATIF trajectory JSON dicts into NAT Span objects, supporting agent steps (LLM), system/pipeline steps (FUNCTION), nested tool chains, and subagent trajectories - Add `ATIFTrajectoryPhoenixExporter` that wraps the converter to batch-export spans to a Phoenix server via OpenTelemetry HTTP - Add CLI script (`export_atif_trajectory_to_phoenix.py`) for exporting one or more ATIF JSON files to Phoenix with configurable options ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing/index.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. ## Summary by CodeRabbit * **New Features** * Added ATIF trajectory export to Phoenix, converting trajectories into OpenTelemetry-compatible spans. * New CLI to export one or multiple ATIF JSON files with endpoint, project, and verbose options. * **Documentation** * Added a comprehensive README documenting setup, running a Phoenix server, export CLI usage, span hierarchy expectations, and behavior notes for trajectory processing. Authors: - Yuchen Zhang (https://github.com/yczhang-nv) - David Gardner (https://github.com/dagardner-nv) Approvers: - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) URL: #1869
## Summary - Add `exa_internet_search` tool using `langchain_exa.ExaSearchResults`, mirroring the existing `tavily_internet_search` tool - Includes `ExaInternetSearchToolConfig` with configurable `max_results`, `search_type` (`Literal["auto", "neural", "keyword"]`), `livecrawl` (`Literal["always", "fallback", "never"]`), `max_query_length`, and `api_key` (via config or `EXA_API_KEY` env var) - Client instantiated lazily inside the invocation path, only when a valid API key is present - Adds `langchain-exa>=1.1.0,<2.0.0` dependency to `nvidia-nat-langchain` - Updates tutorial documentation with an "Using Exa Search" section alongside the existing Tavily section Closes #1848 ## Test plan - [x] Unit tests pass (12 tests in `test_exa_internet_search.py` — config validation, retries, truncation, empty results, empty key) - [x] Existing Tavily tests still pass (no regressions) - [x] Tool registers correctly in `GlobalTypeRegistry` and appears in `nat info components -t function` - [x] `ruff check` passes on all new/modified files - [x] Integration test with a valid `EXA_API_KEY` against live Exa API 🤖 Generated with [Claude Code](https://claude.com/claude-code) Authors: - Max Buckley (https://github.com/maxwbuckley) - Bryan Bednarski (https://github.com/bbednarski9) Approvers: - Bryan Bednarski (https://github.com/bbednarski9) - https://github.com/Salonijain27 URL: #1846
According to the docs `ProcessPoolExecutor.shutdown` : > Any futures that are completed or running won’t be cancelled, regardless of the value of cancel_futures ref: https://docs.python.org/3.13/library/concurrent.futures.html#concurrent.futures.Executor.shutdown Which causes long-running pytest invocation to continue running. This PR switches to using `multiprocessing.pool.Pool` instead. ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing/index.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. ## Summary by CodeRabbit * **Refactor** * Improved internal test runner to process tasks in submission order for more predictable results. * Simplified shutdown behavior to terminate and join worker processes, improving reliability on interrupt. * Ensured early-exit stops remaining work when configured to exit first. * Enhanced shutdown messaging to provide clearer exit information and restore prior interrupt handling. Authors: - David Gardner (https://github.com/dagardner-nv) Approvers: - Will Killian (https://github.com/willkill07) URL: #1886
…fig (#1885) This PR improves MCP OAuth2 manual authentication and adds as Outlook auth example configuration using static client registration. - Optional client secret in manual OAuth2 - mcp_oauth2 now allows manual mode with client_id and optional client_secret (public-client compatible). - Runtime behavior remains consistent: when client_id is provided, manual registration is used instead of dynamic registration. - Added explicit config documentation that client_id takes precedence over dynamic registration. - Makes streamable-http the default transport - Adds an Outlook MCP auth example config file as a quick reference. ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing/index.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. ## Summary by CodeRabbit * **New Features** * Added an Outlook OAuth2 example configuration for per-user MCP authentication and a new default LLM/workflow example. * MCP server transport now defaults to "streamable-http", reducing required configuration. * **Behavior Changes** * OAuth2 validation updated so a provided client_id takes precedence and allows manual registration without requiring a client_secret. * **Tests** * Added unit tests validating the OAuth2 validation behavior and the new transport default. Authors: - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) Approvers: - David Gardner (https://github.com/dagardner-nv) - Yuchen Zhang (https://github.com/yczhang-nv) URL: #1885
* Change the default model in the `nat workflow create` template from `meta/llama-3.1-70b-instruct` to `nvidia/nemotron-3-nano-30b-a3b`. * The `getting_started_with_nat` notebook uses the default config from `nat workflow create` as-is and was having problems with being rate-limited. fixes * This is a bit of a heavy-handed way to fix a notebook, but I think we have been moving away from this model, and it is time to update our default config. * Update `simple_web_query_eval` test to only perform evaluation over the first prompt in the eval dataset. * Add healthchecks for each service in `test_data/docker-compose.services.yml` * Test with the same version of the `arizephoenix/phoenix` container that we instruct users to use in documentation. ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing/index.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. ## Summary by CodeRabbit * **Chores** * Updated Phoenix service image to version 13.22 (CI and compose) * Added health checks for key services (web, worker, DB, search, proxy) * Updated default workflow model in templates * **Tests** * Modified an integration test to use test fixtures and run on a temporary single-item dataset Authors: - David Gardner (https://github.com/dagardner-nv) Approvers: - https://github.com/mnajafian-nv URL: #1884
* Remove API usage that is deprecated in our current version of `starlette`, and removed in v1.0. Closes #1882 ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing/index.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. ## Summary by CodeRabbit * **Chores** * Updated Starlette dependency constraint to `>=0.51,<2.0` as a direct dependency. * **Refactor** * Modified application cleanup sequence during shutdown lifecycle. * Updated WebSocket route registration mechanism. Authors: - David Gardner (https://github.com/dagardner-nv) Approvers: - Will Killian (https://github.com/willkill07) - https://github.com/Salonijain27 URL: #1887
LangGraph's stream_mode="messages" emits both AIMessageChunk (incremental tokens) and AIMessage (final state update) from the agent node. The _stream_fn was accepting both via isinstance(msg, (AIMessage, AIMessageChunk)), causing the full accumulated response to be emitted as a final chunk after all the individual tokens had already been streamed. Clients saw the complete response duplicated at the end of the SSE stream. Filter to only AIMessageChunk so the state update is excluded. Adds a regression test that confirms AIMessage objects are emitted by the graph stream (the duplicate source) and that filtering to AIMessageChunk excludes them. ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing/index.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. ## Summary by CodeRabbit * **Bug Fixes** * Fixed streaming behavior so only chunked assistant message pieces are emitted during chunked streaming, preventing duplicate full assistant content from appearing in streamed agent responses. * **Tests** * Added a regression test that validates chunk-only streaming excludes completed assistant messages, ensuring no duplicated prior assistant text in streamed output. Authors: - Myles Shannon (https://github.com/MylesShannon) - David Gardner (https://github.com/dagardner-nv) Approvers: - David Gardner (https://github.com/dagardner-nv) URL: #1863
Closes #1865 ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing/index.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. ## Summary by CodeRabbit * **Bug Fixes** * Improved streaming message reconstruction to consistently preserve tool-call data. * Restored correct behavior when no stream chunks are produced by returning an empty message. * Standardized streamed-message aggregation to avoid lost or malformed metadata. * **Tests** * Added streaming tests covering tool-call preservation, empty-stream handling, and new chunk-based stream shapes. Authors: - https://github.com/getglad - David Gardner (https://github.com/dagardner-nv) Approvers: - David Gardner (https://github.com/dagardner-nv) URL: #1864
…ort (#1889) Fixes MCP `streamable-http` tool calls hanging when a tool takes > 5 s to produce its first SSE byte (e.g. text2sql). Regression from #1500. ## Root cause #1500 switched to `streamable_http_client(..., http_client=...)` and started passing a pre-built `httpx.AsyncClient(...)`. When a client is supplied, the MCP SDK skips its internal `create_mcp_http_client()` fallback (which sets `httpx.Timeout(30.0, read=300.0)` and `follow_redirects=True`). Our client was constructed with no `timeout=`, so httpx fell back to its 5-second default for every phase including `read`, causing slow tools to fail with `ReadTimeout` inside the transport task group. ## Fix Build the `httpx.AsyncClient` via the SDK's own `create_mcp_http_client(...)` factory so we inherit its defaults, and pass an explicit `httpx.Timeout` whose read value is `max(MCP_DEFAULT_SSE_READ_TIMEOUT, tool_call_timeout, auth_flow_timeout)`. This both restores SDK parity and ensures users who raise `tool_call_timeout` for slow tools don't get cut off at the httpx layer. Only `MCPStreamableHTTPClient` is affected. ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing/index.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. ## Summary by CodeRabbit * **Bug Fixes** * Fixed timeout behavior for long-running operations. Extended tool calls and authentication flows will no longer be prematurely interrupted by default read timeouts, ensuring these operations complete successfully. Authors: - Yuchen Zhang (https://github.com/yczhang-nv) Approvers: - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) URL: #1889
…ient` sub-commands (#1891) * Add missing auth related flags to the `nat mcp client ping` sub-command * Add the following flags to the `mcp client tool` sub-command * --client-id * --client-secret * Update `auth_provider.py` to use the returned resource attribute in the returned `ProtectedResourceMetadata` from the server * Replace usage of `click.echo(<err_msg>, err=True)` with `raise click.ClickException(<err_msg>)`, this ensures that the CLI exits with a non-zero exit code. * Remove performing imports of nat inside a try block, this appears to be a hold over from before the latest refactor. ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing/index.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. ## Summary by CodeRabbit * **New Features** * Added OAuth client credential flags (--client-id, --client-secret) to CLI commands. * Added bearer-token options for tool calls (--bearer-token, --bearer-token-env). * Added per-user execution flags (--per-user, --user-id). * OAuth2 discovery now respects protected-resource identifiers. * **Documentation** * Updated CLI help text to document new authentication and per-user options. * **Tests** * Added tests for credential forwarding and protected-resource handling. * **Bug Fixes** * Improved CLI validation and error reporting for auth/transport combinations. Authors: - David Gardner (https://github.com/dagardner-nv) Approvers: - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) URL: #1891
Removes Flask from `nvidia-nat-core`'s hard dependencies. Flask is not imported by any library code that runs in a consumer's environment — it is only used by `local_sandbox_server.py`, which runs **inside** the sandbox Docker image (where Flask is provided by `sandbox.requirements.txt`), and by `tests/nat/tools/test_code_execution.py`, which imports it to mock the sandbox HTTP handler. After this change: - `pip install nvidia-nat-core` no longer pulls in `flask`, `blinker`, `itsdangerous`, or `werkzeug`. - The sandbox Docker workflow is unchanged — the image still installs Flask via `sandbox.requirements.txt`. - `nvidia-nat-test` now declares `flask>=3.0.0` directly, so any environment that runs the test suite (including `pip install "nvidia-nat-core[test]"`, which transitively pulls `nvidia-nat-test`) still has Flask available. Closes #1870 ## Files changed - `packages/nvidia_nat_core/pyproject.toml` — drop `flask>=3.0.0` from hard `dependencies`. - `packages/nvidia_nat_test/pyproject.toml` — add `flask>=3.0.0` (with comment) so tests that import `local_sandbox_server.do_execute` keep working. ## Test plan - [x] `uv lock` regenerates cleanly (run with pinned `uv==0.9.28` to match CI). - [x] `pip install nvidia-nat-core` in a fresh venv shows no `flask` / `blinker` / `itsdangerous` / `werkzeug`. - [x] `pytest packages/nvidia_nat_core/tests/nat/tools/test_code_execution.py` passes (verifies the test-time Flask import still resolves via `nvidia-nat-test`). - [x] Sandbox Docker image builds and serves requests as before (Flask continues to come from `sandbox.requirements.txt`). ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing/index.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. ## Summary by CodeRabbit * **Chores** * Removed Flask dependency from core package to reduce installation footprint. * Added Flask dependency to test package for sandbox HTTP handler support. Authors: - Bryan Bednarski (https://github.com/bbednarski9) Approvers: - David Gardner (https://github.com/dagardner-nv) URL: #1895
… subpackages (#1896) Removes four declared-but-unused hard dependencies from `nvidia-nat-core` and pushes them down to the sibling packages that actually import them. Each of `aioboto3`, `huggingface_hub`, `plotly`, and `wikipedia` was listed in `nvidia-nat-core`'s `pyproject.toml` but had **zero import sites** anywhere in `packages/nvidia_nat_core/src/`. They were free-ridden on by sibling packages that imported them without declaring them. This PR fixes the declaration-vs-usage mismatch. Net effect on a fresh `pip install nvidia-nat-core` (no extras): | | Before | After | Δ | |---|---:|---:|---:| | venv size | 288 MB | **200 MB** | **−88 MB (−31%)** | | package count | 207 | **151** | **−56** | Companion PR to #1894 (which removed `flask` from core's hard deps under #1870). Together, the two PRs reduce `nvidia-nat-core`'s no-extras install from ~289 MB to ~200 MB. ## Changes **`packages/nvidia_nat_core/pyproject.toml`** — remove from hard `dependencies`: ```diff - "aioboto3>=11.0.0", - "huggingface_hub>=0.33.4,<1.0.0", - "plotly~=6.0", - "wikipedia~=1.4", ``` **Sibling packages** — declare the dep at the package that actually uses it: | Package | Added | Used by | |---|---|---| | `nvidia-nat-eval` | `aioboto3>=11.0.0` | `dataset_handler/dataset_downloader.py`, `utils/output_uploader.py` | | `nvidia-nat-security` | `plotly~=6.0` | `eval/runners/red_teaming_runner/report_utils.py` | | `nvidia-nat-langchain` | `wikipedia~=1.4` | `tools/wikipedia_search.py` (via `langchain_community.WikipediaLoader`) | `huggingface_hub` is already declared by `nvidia-nat-nemo-customizer` (`packages/nvidia_nat_nemo_customizer/pyproject.toml:59`), and `aioboto3` is also already declared by `nvidia-nat-s3` — no edits needed there. All five `uv.lock` files regenerated with the pinned `uv==0.9.28` to match CI. ## Verification | Check | Result | |---|---| | `uv lock --check` (root + nvidia_nat_core + nvidia_nat_eval + nvidia_nat_security + nvidia_nat_langchain) under `uv==0.9.28` | ✅ | | Fresh venv `pip install nvidia-nat-core` shows none of `aioboto3` / `botocore` / `boto3` / `huggingface-hub` / `hf_xet` / `plotly` / `narwhals` / `wikipedia` | ✅ | | Fresh venv `pip install ".[most]"` (497 packages) — all 4 deps still present transitively via siblings | ✅ | | Smoke imports for every affected sibling module (`nat.plugins.eval.dataset_handler.dataset_downloader`, `nat.plugins.eval.utils.output_uploader`, `nat.plugins.security.eval.runners.red_teaming_runner.report_utils`, `nat.plugins.langchain.tools.wikipedia_search`, `nat.plugins.customizer.dpo.trainer_adapter`, `nat.plugins.s3.s3_object_store`) | 7/7 ✅ | ## User-facing impact - `pip install nvidia-nat-core` no longer pulls flask transitively (already gone via #1895), nor `aioboto3` / `huggingface_hub` / `plotly` / `wikipedia` and their transitives (`botocore`, `boto3`, `s3transfer`, `aiobotocore`, `aiofiles`, `hf-xet`, `narwhals`, etc.). - `pip install "nvidia-nat[most]"` is unchanged — every removed dep is still resolved via the sibling that owns it. - `pip install nvidia-nat-eval` / `nvidia-nat-security` / `nvidia-nat-langchain` now correctly declare their actual runtime needs; previously these packages worked only when installed alongside `nvidia-nat-core` (which was masking the missing declaration). - The sandbox Docker image and any other production paths are unaffected. Closes nothing (no specific issue), but follows directly from the work in #1870 / #1895. ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing/index.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. ## Summary by CodeRabbit * **Chores** * Dependencies have been reorganized across the package distribution. The core package has been simplified by removing four previously required dependencies, while specialized evaluation and language chain packages now explicitly declare their specific runtime requirements. This restructuring ensures each package includes only the dependencies necessary for proper operations. Authors: - Bryan Bednarski (https://github.com/bbednarski9) Approvers: - David Gardner (https://github.com/dagardner-nv) - Will Killian (https://github.com/willkill07) URL: #1896
…sponse (#1876) ## Summary The `_chat_completion` error handler in `packages/nvidia_nat_core/src/nat/tool/chat_completion.py` concatenated `str(e)` from the caught exception into the user-facing response string (`"...Error: {str(e)}"`). Since the response is returned over the network to API callers, that leaked internal details — stack frame class names, DB driver messages revealing schema, HTTP client errors revealing endpoint URLs / key prefixes, and file paths. Move the exception detail to server-side logging (`logger.exception`) and return only the user-safe apology + query echo. Operators triage via logs; callers no longer see internal state. ## What changed - `packages/nvidia_nat_core/src/nat/tool/chat_completion.py` - Added module-level `logger = logging.getLogger(__name__)` - Replaced `Error: {str(e)}` suffix with server-side `logger.exception("chat completion failed")` - Replaced silent `except Exception: pass` on the last-message extraction with `logger.exception(...)` for observability ## CWE CWE-209 — Information Exposure Through an Error Message ## Why this matters Caller-visible error responses in an LLM / agent pipeline often get echoed into chat UIs, logged downstream, or stored in conversation history. Leaking database schemas, endpoint URLs, or file paths in those surfaces is a small but real hardening gap. Happy-path output is unchanged. ## Test plan - [x] Logger pattern matches NAT core convention (module-level `logger = logging.getLogger(__name__)`) - [x] Happy-path response unchanged - [x] Regression tests added covering error-path sanitization and server-side logging Signed-off-by: Colin McDonough <cmcdonough@50words.com> Authors: - Colin McDonough (https://github.com/ColinM-sys) - David Gardner (https://github.com/dagardner-nv) Approvers: - David Gardner (https://github.com/dagardner-nv) URL: #1876
… core (#1902) This PR reduces the base dependency surface of `nvidia-nat-core` and moves dependency ownership closer to the packages that actually use those integrations. 27MB saved for nvidia-nat-core installs with an additional 35MB for nvidia-nat-eval without full - Removes `optuna` and `openinference-semantic-conventions` from `nvidia-nat-core`. - Adds `openinference-semantic-conventions` to `nvidia-nat-opentelemetry`, where it is used. - Makes `aioboto3` optional for `nvidia-nat-eval` under `full` and `test` extras. - Lazy-loads eval S3 upload dependencies so importing eval runtime does not require `aioboto3`. - Replaces the core Optuna `Trial` type dependency with a local protocol for Optuna-compatible trial objects. - Moves `test_evaluate_callbacks.py` into the eval package and updates it to use non-deprecated `nat.plugins.eval` imports. - Updates `uv.lock` files to reflect the dependency boundary changes. ## Testing ```bash git diff --cached --check uv run pytest packages/nvidia_nat_core/tests/nat/data_models/test_optimizable.py packages/nvidia_nat_config_optimizer/tests/test_parameter_optimizer.py packages/nvidia_nat_config_optimizer/tests/test_optimizable_utils.py packages nvidia_nat_eval/tests/eval/test_dependency_guidance.py packages/nvidia_nat_eval/tests/eval/test_evaluate_callbacks.py packages/nvidia_nat_eval/tests/eval/utils/test_output_uploader.py packages/nvidia_nat_eval/tests/eval/test_evaluate.py -q uv run pytest packages/nvidia_nat_opentelemetry/tests/observability -q ``` Focused eval/core/optimizer suite: 111 passed OpenTelemetry suite: 93 passed⚠️ Breaking change: aioboto3 moved from base to [full] extra nvidia-nat-eval previously installed aioboto3 (transitively boto3/botocore) as a base dependency. This branch moves it to the [full] extra, removing ~35 MB and 22 transitive packages (incl. the full aiohttp async stack and botocore AWS service models) from the default install. Who is affected Users running pip install nvidia-nat-eval (without [full]) whose eval configs use the dataset.s3.* download path. On develop this worked out of the box; after this change, boto3 is not present and the S3 download will raise ModuleNotFoundError at runtime. What stays the same - Local datasets are unaffected; signed-URL and S3 remote datasets require optional full eval dependencies if their transport dependency is not otherwise installed. - The [full] extra still pulls everything previously available. - nvidia-nat[eval] users (the recommended top-level install) are unaffected. Migration ```text pip install 'nvidia-nat-eval[full]' # or uv pip install 'nvidia-nat[eval]' ``` The runtime now raises a clear ModuleNotFoundError with this install hint at the point of failure, rather than failing at import time. Other dep moves in this PR (no behavioral break expected) - optuna removed from nvidia-nat-core base; nvidia-nat-config-optimizer keeps it pinned. Core was refactored to use a Protocol so no top-level optuna import remains. - openinference-semantic-conventions moved from nvidia-nat-core to nvidia-nat-opentelemetry base. Core no longer imports it. These moves only affect users who were importing those packages directly while depending solely on nvidia-nat-core — a pattern that effectively requires they already had the package installed by other means. Closes ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing/index.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. ## Summary by CodeRabbit * **Chores** * Optional dependencies reorganized: S3 and related libraries moved to optional "full"/test installs to reduce default install size * Lazy-loading of optional libraries for S3 and signed-URL downloads to avoid import-time failures and provide clear pip-install hints when missing * Core search API typing decoupled from a specific optimization backend for broader compatibility * **Tests** * New and expanded tests covering missing-optional-dependency behavior and search type-hint resolution Authors: - Bryan Bednarski (https://github.com/bbednarski9) Approvers: - David Gardner (https://github.com/dagardner-nv) - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) - Will Killian (https://github.com/willkill07) URL: #1902
## Summary - add a first-class OCI LLM config to NAT core and register it alongside the existing providers - add LangChain wrapper support for OCI via `langchain-oci`, matching the workflow-layer integration shape used by AWS Bedrock in this repo - add OCI docs and tests, with live integration coverage centered on an OCI-hosted Nemotron inference endpoint - declare the necessary `uv` extra conflicts so the workspace remains solvable when `langchain-oci` introduces `openai>=2` alongside existing `strands` and `vanna` surfaces ## What Was Tested - `PYTHONPATH=$(pwd)/packages/nvidia_nat_core/src:$(pwd)/packages/nvidia_nat_langchain/src:$(pwd)/packages/nvidia_nat_test/src .venv/bin/pytest packages/nvidia_nat_core/tests/nat/llm/test_oci_llm.py packages/nvidia_nat_langchain/tests/test_llm_langchain.py -q -k 'OCI or oci'` - `OCI_NEMOTRON_BASE_URL=http://127.0.0.1:8080/v1 OCI_NEMOTRON_MODEL=nvidia/Llama-3.1-Nemotron-Nano-8B-v1 PYTHONPATH=$(pwd)/packages/nvidia_nat_core/src:$(pwd)/packages/nvidia_nat_langchain/src:$(pwd)/packages/nvidia_nat_test/src .venv/bin/pytest packages/nvidia_nat_langchain/tests/test_langchain_agents.py -q --run_integration -k oci_hosted_nemotron_openai_compatible_agent` - `uv lock` ## Notes - all live validation in this PR is centered on `nvidia/Llama-3.1-Nemotron-Nano-8B-v1` - the live Nemotron endpoint is served from an OKE + vLLM inference layer in Phoenix - this closes the main OCI workflow-layer gap relative to the existing AWS Bedrock path in `nvidia_nat_langchain` ## Summary by CodeRabbit * **New Features** * Added an OCI-hosted LLM provider with LangChain client support and Nemotron-compatible model option. * **Documentation** * New OCI Generative AI integration guide, config examples, TOC entry, and documentation redirects; OCI added to supported providers list. * **Tests** * Added unit and integration tests for the OCI provider and LangChain wrapper, plus a pytest fixture for OCI Nemotron endpoints. * **Chores** * Broadened OpenAI dependency ranges, added langchain-oci runtime dependency, and adjusted optional extras/conflicts in packaging. Authors: - Federico Kamelhar (https://github.com/fede-kamel) - Will Killian (https://github.com/willkill07) Approvers: - Will Killian (https://github.com/willkill07) - https://github.com/Salonijain27 URL: #1804
* Replace the two `phi-3-*` models are no longer being hosted on build.nvidia.com * Replace references to `standardized_results_all.csv` in documentation, this was renamed in the code to `standardized_data_all.csv` at some point. * Fix bug in `nat eval`'s `--skip_completed_entries` feature, the output values were `numpy.nan` which was passing the `if not` test, incorrectly causing all workflows to be skipped. * Update graph generation code snippets to prevent clipping of text, add code snippet for producing the Ragas metrics graph. * Fix an issue where the workflows were returning correct responses, but receiving low accuracy scores (~0.25 despite correctness of answers). Improve evaluation accuracy results by: * Add explicit `additional_instructions` to the workflow improving the accuracy by matching the expected labels * Rename the `phish` label to `phishing` * Replace the LLM judge model with `nvidia/nemotron-3-super-120b-a12b` and increase the allocated tokens for judging ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing/index.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. ## Summary by CodeRabbit * **New Features** * Added multiple email-phishing evaluation configurations and new evaluation profiles. * **Bug Fixes** * Improved handling of null/incomplete entries in evaluation workflows. * **Documentation** * Updated profiling guide with refreshed model lineup, runnable RAGAS metrics script, and modernized plotting examples. * **Chores** * Enforced explicit "phishing"/"benign" output constraint across workflows, updated a test expectation, added .codex to .gitignore, and relaxed path checks for specific model tokens. Authors: - David Gardner (https://github.com/dagardner-nv) Approvers: - https://github.com/lvojtku - https://github.com/Salonijain27 - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) URL: #1904
) ## Summary Adds the **Agentic Trajectory Observability Format (ATOF)** to the NeMo Agent Toolkit, together with a converter that translates ATOF event streams into ATIF v1.7 trajectories. ATOF is a JSON-Lines wire format for *runtime observation* of agent execution; ATIF is the existing *static interchange format* for completed trajectories. The PR closes the gap between the two. It also introduces a pluggable, **schema-map-driven extractor system** that handles multiple LLM providers (OpenAI chat-completions, Anthropic Messages, Gemini generateContent) in heterogeneous streams — one trajectory can declare multiple `data_schema`s and the converter dispatches per-event without any producer-side coordination. **Scope:** consumer-side only. Producer-side schema delivery is intentionally deferred (see §"What's not in this PR"). --- ## Why ATOF Today, NAT can produce ATIF trajectories from already-decomposed `Step`-shaped records, but there's no standardized wire format for the *runtime events* that upstream those steps — scope opens/closes, marks, tool invocations — before they're collapsed by an analysis layer. Without a standard wire format: - Producer instrumentation (NeMo-Flow, custom agent runtimes, observability SDKs) each emit their own ad-hoc shape. - Consumers (replay tools, validators, eval harnesses) need bespoke parsers per producer. - Multi-provider trajectories (e.g. an orchestrator routing to OpenAI + Anthropic) can't be ingested without per-stream schema lock. ATOF defines the wire format; the converter in this PR is the reference consumer-side implementation. Together they let any producer that emits ATOF have its trajectories analyzed by any ATIF-compatible tool. --- ## High-level deliverables | Area | What's new | |------|-----------| | **Wire format spec** | `atof-event-format.md` — the v0.1 spec defining 2 event kinds (`ScopeEvent`, `MarkEvent`), category vocabulary, and the `data_schema` discrimination protocol | | **Reference Python package** | `nat.atof` — 9 new modules implementing the spec (events, IO, categories, flags, schemas, extractors, converter) | | **ATOF → ATIF converter** | `nat.atof.scripts.atof_to_atif_converter` — single entry point, validates `data_schema`, emits ATIF v1.7 trajectories | | **Schema-map extractor system** | `nat.atof.extractors` — declarative paths + 3 named hooks for irreducible per-provider transforms; built-in support for OpenAI, Anthropic, Gemini | | **Worked examples** | 6 example trajectories (EXMP-01 through EXMP-06) covering tier-1 opaque, OpenAI tool-call, mark events, Anthropic Messages, Gemini generateContent, and a heterogeneous router using all three providers in one stream | | **Test suite** | 6 test files, ~120 tests covering protocol conformance, payload extraction, schema validation, shape-mismatch error contracts, spec compliance, tier-1 fall-through, and per-provider × per-scenario matrix dispatch | | **Conversion guide** | `atof-to-atif-conversion-guide.md` — implementation-neutral spec for engineers/coding-assistants writing their own mappers | | **Minor ATIF additions (v1.7 updates)** | `function_ancestry.py`; new fields on `Step`/`ToolCall`/`Trajectory` for ATIF v1.7 alignment | --- ## Architecture ### The mapping problem ATOF carries low-level *events*; ATIF carries high-level *steps*. The conversion is N-to-M: a user query + LLM response + tool round-trip collapses into ~2-3 ATIF steps. The converter: 1. Sorts events by timestamp. 2. Validates each event's `data` against its declared `data_schema` (if registered) — fail-fast via `DataSchemaViolationError`. 3. Dispatches each event by `data_schema` to a registered extractor. 4. Emits ATIF steps with proper dedup, ancestry, and observation attachment. ### The schema-map architecture Provider payloads vary widely in shape but share a common skeleton: input messages, output text, output tool calls. The mapping is mostly *positional* ("the messages live at this path"), with a small irreducible set of transforms that can't be expressed as paths alone. The `SchemaMap` dataclass captures both: - **Declarative paths** — dotted paths (with array indices) for input messages, output text, output tool calls, and per-tool-call fields. - **Three optional escape-hatch hooks** — for the irreducible per-provider transforms: 1. `normalize_input_messages` — for polymorphic content (string OR list-of-blocks) 2. `normalize_output_message` — for splitting a single content array into `(text, tool_calls)` 3. `transform_tool_call` — for ID synthesis or non-standard nesting Pure-paths providers (OpenAI) need zero hooks. Richer providers (Anthropic content blocks, Gemini parts) use one or two. The single `SchemaMapLlmExtractor` engine serves all three providers; new providers add ~20-50 lines of declarative config rather than a new class. ### Per-event dispatch (no per-stream lock) The converter resolves the extractor per event via `event.data_schema`. A single trajectory MAY declare multiple schemas — see EXMP-06, which contains three LLM scope events declaring OpenAI, Anthropic, and Gemini in turn, all dispatched correctly in one stream. ### Three protocols, three registries | Protocol | Purpose | Registry | |----------|---------|----------| | `LlmPayloadExtractor` | Parses `llm` scope events | `LLM_EXTRACTOR_REGISTRY` | | `ToolPayloadExtractor` | Parses `tool` scope-end events | `TOOL_EXTRACTOR_REGISTRY` | | `MarkPayloadExtractor` | Classifies mark events as sourced/opaque | `MARK_EXTRACTOR_REGISTRY` | A `SCHEMA_REGISTRY` parallels these for JSON Schema validation. Default extractors handle OpenAI chat-completions and tier-1 opaque payloads out of the box. Anthropic and Gemini are opt-in via `register_anthropic_messages_v1()` and `register_gemini_generate_content_v1()`. --- ## File structure ``` packages/nvidia_nat_atif/ ├── atof-event-format.md # ATOF v0.1 wire-format spec ├── atof-to-atif-conversion-guide.md # Implementation-neutral mapper spec ├── atif-step-extra-guide.md # NAT's Step.extra / ToolCall.extra contract (rewritten for v1.7) ├── intermediate-step-to-atif-mapping.md # NAT IntermediateStep → ATIF (rewritten for v1.7) ├── pyproject.toml # +jsonschema>=4.0 ├── src/nat/atif/ │ ├── atif_step_extra.py # AtifAncestry, AtifInvocationInfo, AtifStepExtra, AtifToolCallExtra (v1.7) │ ├── step.py # function_ancestry field REMOVED (v1.7) │ ├── tool_call.py # tool_ancestry field REMOVED (v1.7); extra field documented │ ├── trajectory.py # +trajectory_id; session_id relaxed; subagent uniqueness validator │ ├── subagent_trajectory_ref.py # +trajectory_id (canonical); session_id optional (informational) │ ├── observation_result.py # +extra field (ATIF v1.7 addition for per-result metadata) │ └── __init__.py # +AtifToolCallExtra; FunctionAncestry removed ├── src/nat/atof/ # NEW package │ ├── __init__.py # Public API surface │ ├── category.py # Canonical category vocabulary │ ├── events.py # ScopeEvent, MarkEvent (Pydantic) │ ├── extractors.py # SchemaMap engine + 3 protocols │ ├── flags.py # Behavioral flag enum │ ├── io.py # JSONL read/write helpers │ ├── schemas.py # JSON Schema registry │ └── scripts/ │ └── atof_to_atif_converter.py # The converter — emits ATIF v1.7 ├── examples/atof_to_atif/ │ ├── README.md # Example walkthrough │ ├── generate_atof_examples.py # 6 deterministic example generators │ ├── convert_atof_examples_to_atif.py # Runner: generates 6 v1.7 ATIF outputs │ └── output/ # Pre-generated artifacts (12 files) └── tests/ ├── test_extractors.py # 31 tests — protocol + per-provider ├── test_data_schema_validation.py # 11 tests — JSON Schema dispatch ├── test_shape_mismatch.py # 6 tests — fail-fast error contract ├── test_spec_compliance.py # 63 tests — ATOF wire-format conformance ├── test_tier1_conversion.py # 4 tests — opaque fall-through ├── test_schema_validation.py # 12 tests — 3×3 provider × scenario └── test_atif_v17_validators.py # 12 tests — v1.7 model validators ``` ### Worked examples (deterministic, regeneratable) | Example | Demonstrates | |---------|-------------| | **EXMP-01** Tier-1 opaque | Zero-instrumentation producer; every scope `category: "unknown"`; falls through to system steps | | **EXMP-02** OpenAI tool call | Tier-2 semantic-tagged; OpenAI chat-completions; calculator tool round-trip | | **EXMP-03** Mark events | Session start/end marks bracketing a chat agent; demonstrates the mark event kind | | **EXMP-04** Anthropic Messages | Polymorphic `content` blocks (`text` + `tool_use` + `tool_result`); Anthropic-correct tool result transport | | **EXMP-05** Gemini generateContent | `parts[]` polymorphism (`text` + `functionCall` + `functionResponse`); role aliasing (`model` → `assistant`); synthesized `tool_call_id` | | **EXMP-06** Heterogeneous router | One trajectory; orchestrator delegates to OpenAI (router) + Anthropic (code) + Gemini (math); per-event dispatch in action | Each example ships with both the input ATOF JSONL and the expected ATIF JSON output, so the conversion is end-to-end reproducible. --- ## Testing | Test file | Tests | Coverage | |-----------|------:|---------| | `test_spec_compliance.py` | 63 | ATOF wire-format conformance — event validation, category enforcement, attribute canonicalization, timestamp formats | | `test_extractors.py` | 31 | Protocol conformance, OpenAI extractor unit tests, registry validation, end-to-end custom registration | | `test_schema_validation.py` | 14 | Parametrized 3×3 matrix: `{openai, anthropic, gemini} × {simple, nested, multi_turn}` + heterogeneous-stream + idempotency; canonical spec-example trajectory roundtrip; ObservationResult.extra round-trip | | `test_data_schema_validation.py` | 11 | JSON Schema validation gate, custom schema registration, validation error surface | | `test_shape_mismatch.py` | 6 | `ShapeMismatchError` fail-fast contract on non-empty `data` yielding empty extraction | | `test_tier1_conversion.py` | 4 | Tier-1 opaque fall-through | ```bash $ uv run --extra=test pytest packages/nvidia_nat_atif/tests/ ============================== 137 passed ============================== ``` --- ## Documentation Two specs ship with this PR: 1. **`atof-event-format.md`** — the **wire-format spec** (v0.1). Defines the envelope, the two event kinds, the category vocabulary, the `data_schema` discrimination protocol, and the canonical flags. Producer- and consumer-binding. 2. **`atof-to-atif-conversion-guide.md`** — the **mapping spec**. 768 lines, implementation-neutral, with 14 numbered conversion rules (M-01 through M-14) and 12 invariants (I-01 through I-12). The intent: an engineer or coding assistant reading this guide should be able to write a correct ATOF→ATIF mapper for any new provider in any language. The final section sketches how `nat.atof` realizes the rules and how to extend it for new consumer schemas. Plus an `examples/atof_to_atif/README.md` walking through the 6 worked examples. --- ## Backwards compatibility - **Additive (mostly)** No changes to existing ATIF behavior. The new `function_ancestry`, `tool_ancestry`, and `llm_call_count` fields on `Step`/`ToolCall`/`Trajectory` default to `None` and are tolerated by consumers that don't read them. trajectory_id and session_id updates are technically a breaking change on the existing ATIF code. But its minimal and required for compliance with ATIF 1.7 - **Default OpenAI extraction preserved.** Events without a `data_schema`, or with one that has no registered extractor, fall back to the OpenAI chat-completions extractor — the same behavior as before this PR. Existing test fixtures using OpenAI-shaped payloads are unchanged. - **One new runtime dependency** — `jsonschema>=4.0`, used for `data_schema` pre-pass validation. --- ## What's NOT in this PR (intentional) - **Producer-side schema declaration.** Today, registering a non-default schema/extractor is a *consumer-side* concern: the consumer pre-installs extractors before invoking the converter. A future ATOF revision will specify how producers can ship their schema *along with* the trajectory (stream-level manifest, scope-start metadata, or out-of-band sidecar). A `DESIGN NOTE` block at the top of `schemas.py` captures the three options and the deferred recommendation. - **Schema-map refactor of `ToolPayloadExtractor` / `MarkPayloadExtractor`.** Their contracts are too narrow to benefit from declarative paths; current Protocol implementations are kept. - **Producer integration with NeMo-Flow.** Out of scope here; tracked separately. The wire-format spec is producer-agnostic. --- ## How to extend (consumer side) Adding a new LLM provider's extractor takes 3 steps and ~30 lines: ```python from nat.atof.extractors import ( SchemaMap, SchemaMapLlmExtractor, register_llm_extractor, ) from nat.atof.schemas import register_schema # 1. Define the field-path map (+ hooks if shape is rich) MYCO_MAP = SchemaMap( name="myco/llm", version="1", input_messages_paths=("input.history",), output_text_paths=("output.answer",), output_tool_calls_paths=("output.actions",), tool_call_id_paths=("action_id",), tool_call_name_paths=("action_name",), tool_call_args_paths=("args",), ) # 2. Define a permissive JSON Schema for validation MYCO_SCHEMA = { "$id": "myco/llm@1", "type": "object", "anyOf": [{"required": ["input"]}, {"required": ["output"]}], } # 3. Register both before invoking the converter register_schema("myco/llm", "1", MYCO_SCHEMA) register_llm_extractor("myco/llm", "1", SchemaMapLlmExtractor(MYCO_MAP)) ``` Full instructions in `atof-to-atif-conversion-guide.md` §7. --- ## Test plan - [x] `cd packages/nvidia_nat_atif && uv run --extra=test pytest tests/` — all 124 tests pass - [x] `uv run python examples/atof_to_atif/generate_atof_examples.py` — regenerates all 6 example JSONLs deterministically - [x] `uv run python examples/atof_to_atif/convert_atof_examples_to_atif.py` — converts all 6 to ATIF, no `ShapeMismatchError` - [x] `git diff origin/develop -- examples/atof_to_atif/output/` — generated outputs match committed reference (deterministic regeneration) - [x] `ruff check packages/nvidia_nat_atif/` — clean - [x] `pyright packages/nvidia_nat_atif/` — clean --- Closes ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing/index.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. ## Summary by CodeRabbit * **New Features** * Added ATOF v0.1 JSONL event format with typed events, read/write tooling, a converter to ATIF, example generators, and a CLI conversion script. Introduced canonical category and flag vocabularies and a public ATOF API. * **Documentation** * Added ATOF core specification and end-to-end conversion examples/README. * **Chores** * Updated acceptance vocabulary and added jsonschema>=4.0 dependency. * **Tests** * Added extensive spec-compliance, shape-mismatch, and tier-1 conversion tests. * **ATIF Compatibility** * Bumped ATIF to v1.7; added subagent, ancestry, and per-step/tool metadata fields. Authors: - Bryan Bednarski (https://github.com/bbednarski9) - Yuchen Zhang (https://github.com/yczhang-nv) Approvers: - Yuchen Zhang (https://github.com/yczhang-nv) - https://github.com/mnajafian-nv - https://github.com/Salonijain27 URL: #1890
…1897) Adds aggregate, anonymous CLI-usage telemetry to the nat Python CLI. Posture is consent-gated: - A first-run prompt asks the user whether to allow telemetry, defaulting to yes (Enter accepts, type n to opt out) - The decision is persisted across invocations. - Non-interactive sessions (CI / cron / piped / stderr-captured) never prompt and default OFF - `NAT_TELEMETRY_ENABLED` env var overrides both. ### Consent flow Order of precedence for whether telemetry is active: - `NAT_TELEMETRY_ENABLED` env var (any value — 1/true/yes enables, anything else disables). Sets the answer for this shell session and bypasses both the persisted file and the prompt. - Persisted decision at ~/.config/nat/telemetry.toml (TOML, written once when the user answers the first-run prompt; can be re-set anytime via nat configure telemetry --enable | --disable). The persisted record carries a prompt_version; if we materially change what the prompt discloses (new collected field, new endpoint), bumping PROMPT_VERSION forces a re-prompt for users who consented under the old language. - Interactive prompt (stdin, stdout, and stderr must all be TTYs) on first run when neither of the above is set. Default answer is yes: pressing Enter (or typing y/yes) → enabled. Typing n/no, garbage, hitting EOF, or Ctrl-C → disabled. The prompt explicitly lists what is collected and what is not. - Default OFF in non-interactive sessions (no TTY on any of the three standard streams → no opportunity to ask → no data sent). ### CLI subcommand: nat configure telemetry Three modes: ```bash nat configure telemetry --enable — persist consent as enabled nat configure telemetry --disable — persist consent as disabled nat configure telemetry --status (default) — show current effective state and its source ``` The status command flags when an env var is overriding the persisted decision, so users aren't surprised when their persisted choice doesn't take effect. The first-run prompt is suppressed when running any of these three subcommands — they exist to manage the consent decision, so prompting first would break the read-only contract of --status and conflict with explicit --disable / --enable. ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing/index.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. ## Summary by CodeRabbit * **New Features** * Opt-in runtime telemetry for the NAT CLI (disabled by default). * New `nat configure telemetry` command: enable, disable, or show status. * First-run consent prompt for interactive sessions and env-var override via NAT_TELEMETRY_ENABLED. * Telemetry emission for CLI invocations, with local debugging (stdout endpoint) and dry-run mode. * **Documentation** * README updated with telemetry details, consent flow, available controls, and data collection clarity. * **Tests** * Extensive unit and integration tests covering consent flow, CLI integration, event payloads, handler, and configuration. Authors: - Yuchen Zhang (https://github.com/yczhang-nv) Approvers: - David Gardner (https://github.com/dagardner-nv) URL: #1897
Adds [Synap](https://maximem.ai) as a third-party memory plugin for NeMo Agent Toolkit. ## Changes - **`docs/source/build-workflows/memory.md`** — adds a "Third-Party Memory Plugins" section listing Synap - **`examples/memory/synap/README.md`** — installation and usage guide (YAML and programmatic), with SPDX Apache-2.0 header ## About Synap Synap is a managed memory layer for AI agents. The `maximem-synap-nemo-agent-toolkit` package provides `SynapMemoryEditor`, a drop-in `MemoryEditor` implementation that registers as a `synap_memory` plugin via the NAT entry-point system. Works with `auto_memory_agent` exactly like the existing Mem0/Zep/Redis/MemMachine plugins. **Install:** `pip install maximem-synap-nemo-agent-toolkit` **PyPI:** https://pypi.org/project/maximem-synap-nemo-agent-toolkit/ **Docs:** https://docs.maximem.ai/integrations/nemo-agent-toolkit **Open source:** Integration package source at [`maximem-ai/maximem_synap_sdk`](https://github.com/maximem-ai/maximem_synap_sdk/tree/main/packages/integrations) — contributions welcome ## Summary by CodeRabbit * **Documentation** * Added documentation for the Synap third-party memory plugin, including installation instructions, configuration guidance for YAML workflows, and Python usage examples. * Updated vocabulary configuration to support the Synap plugin. Authors: - Anish Yadav (https://github.com/visy-ani) - Will Killian (https://github.com/willkill07) Approvers: - Will Killian (https://github.com/willkill07) URL: #1906
Closes ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing/index.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. ## Summary by CodeRabbit * **New Features** * Arize AX added as a built-in observability exporter supporting tracing. * **Documentation** * New guide for setup, config, EU/US endpoint options, env var usage, and integration notes. * Installation and observability docs updated to list Arize AX in exporters and provider tabs. * **Examples** * Example config and README updated to demonstrate Arize AX usage. * **Tests** * Added unit tests validating Arize AX exporter behavior. Authors: - Rich Young (https://github.com/ryoung562) - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) Approvers: - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) URL: #1898
…d-1b-v2` (#1937) * Pin `pymilvus` to v2.6.9 to work-around langchain-ai/langchain-milvus#130 ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing/index.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. ## Summary by CodeRabbit * **Documentation** * Updated RAG example README with instructions for the new embedding model. * **Configuration** * Updated embedding model configuration in RAG library settings. * **Dependencies** * Pinned `pymilvus` to a specific version to resolve compatibility issues. * **Tests** * Updated test configurations to align with the new embedding model. [](https://app.coderabbit.ai/change-stack/NVIDIA/NeMo-Agent-Toolkit/pull/1937) Authors: - David Gardner (https://github.com/dagardner-nv) Approvers: - https://github.com/Salonijain27 - Will Killian (https://github.com/willkill07) URL: #1937
## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing/index.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. ## Summary by CodeRabbit * **Chores** * Updated Azure Identity dependency version constraint. [](https://app.coderabbit.ai/change-stack/NVIDIA/NeMo-Agent-Toolkit/pull/1942) Authors: - David Gardner (https://github.com/dagardner-nv) Approvers: - Will Killian (https://github.com/willkill07) - https://github.com/Salonijain27 URL: #1942
…v2` (#1944) ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing/index.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. ## Summary by CodeRabbit * **Chores** * Updated the reranker model in RAG example configurations and documentation to the latest available version. [](https://app.coderabbit.ai/change-stack/NVIDIA/NeMo-Agent-Toolkit/pull/1944) Authors: - David Gardner (https://github.com/dagardner-nv) Approvers: - Yuchen Zhang (https://github.com/yczhang-nv) URL: #1944
…1949) Fixes a `typing.Any cannot be used with isinstance()` crash when workflows fall back to `Any` annotations after a non-fatal sequential executor type compatibility warning. The parallel executor example can produce this path because `parallel_analysis` returns `str`, while the downstream chat completion function accepts `ChatRequestOrMessage`. With `raise_type_incompatibility: false`, the sequential executor warns and annotates the generated workflow with `typing.Any`; core function conversion then needs to treat `Any` as accept-all instead of passing it to `isinstance`. ## Changes - Use `DecomposedType.is_instance()` for function output conversion checks. - Treat `typing.Any` class helpers as `object` for runtime class usage. - Use `DecomposedType.is_instance()` for input conversion checks. - Add regression coverage for `typing.Any` input/output handling in `ainvoke` and `astream`. ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing/index.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. ## Summary by CodeRabbit * **Bug Fixes** * Improved type conversion logic for more accurate handling of `typing.Any` type annotations in inputs and outputs. * Enhanced input type validation to use semantic type-instance checking instead of basic instance checks. * **Tests** * Added comprehensive test coverage for `typing.Any` type support in both synchronous and streaming function invocations. [](https://app.coderabbit.ai/change-stack/NVIDIA/NeMo-Agent-Toolkit/pull/1949?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack) Authors: - Yuchen Zhang (https://github.com/yczhang-nv) Approvers: - Will Killian (https://github.com/willkill07) URL: #1949
## Summary - Cache generated MCP input models by normalized schema so repeated MCP tool discovery reuses the same Pydantic classes. - Add a regression test for revalidating nested request values generated from equivalent MCP schemas. ## Root Cause Repeated MCP schema conversion created structurally equivalent but distinct dynamic Pydantic model classes. When the workflow path revalidated a nested `request` model from an earlier generated schema against a later generated schema, Pydantic rejected it by class identity. Example failure shape: 1. Initial tool registration creates `SearchDatasetsInputSchema` with a nested `RequestInputSchema`. 2. The agent input is converted into a Pydantic object using that first generated nested class. 3. The session tool lookup regenerates the same MCP schema as a new Python class with the same display name. 4. Revalidation then sees `request=<RequestInputSchema from first generation>` but expects `<RequestInputSchema from second generation>`, so Pydantic rejects it even though both classes print as `RequestInputSchema`. The cache makes repeated generation of the same `(tool name, input schema)` return the same class object. closes NAT-224 ## Summary by CodeRabbit * **Configuration Updates** * Switched the default model used by the MCP client and disabled "thinking" mode. * **Performance** * Improved schema normalization and caching so equivalent schemas reliably reuse generated models and avoid redundant work. * **Testing** * Added tests that verify consistent model reuse and stable JSON output for nested requests and order-insensitive arrays. [](https://app.coderabbit.ai/change-stack/NVIDIA/NeMo-Agent-Toolkit/pull/1954?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack) Authors: - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) Approvers: - Yuchen Zhang (https://github.com/yczhang-nv) URL: #1954
Updating the auto-memory wrapper to resolve the runtime user ID from `Context.user_id` before falling back to legacy/custom `user_manager`, `X-User-ID`, and `default_user`. The previous implementation directly dereferenced `Context.user_manager`, which is not present in the current `nat run` context and caused the auto-memory wrapper examples to fail with: `AttributeError: 'Context' object has no attribute 'user_manager'` ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing/index.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. ## Summary by CodeRabbit ## Release Notes * **Improvements** * Enhanced user identification handling with more robust fallback mechanisms when retrieving user IDs from multiple sources, including context, user manager, and request headers. * Updated tests to verify user ID extraction behavior. [](https://app.coderabbit.ai/change-stack/NVIDIA/NeMo-Agent-Toolkit/pull/1948?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack) Authors: - Yuchen Zhang (https://github.com/yczhang-nv) Approvers: - David Gardner (https://github.com/dagardner-nv) URL: #1948
Fixes MCP client reconnect after a local MCP server crashes by ensuring MCP transport contexts are closed and recreated from the same task that opened them. Previously, `MCPBaseClient._reconnect()` closed the existing `AsyncExitStack` directly from the request/tool-call task that observed the failure. For local MCP transports, this can violate AnyIO cancel-scope task ownership and trigger: Attempted to exit cancel scope in a different task than it was entered in After that failure, downstream recovery could leave the workflow in a bad state and cause later LLM calls to fail with closed-client / connection errors. ## Changes - Move MCP client connect/reconnect/close operations into a dedicated lifecycle task. - Route lifecycle operations through an async command queue so request tasks can ask for reconnect without owning transport cleanup. - Clear connection/session/tool cache state during lifecycle close/reconnect. - Add regression coverage for reconnect from a separate request task with a task-bound transport context. ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing/index.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. ## Summary by CodeRabbit * **Refactor** * Improved MCP client connection lifecycle management for better stability and reliability. * Enhanced reconnection handling to work seamlessly across different execution contexts. * **Tests** * Added comprehensive test coverage for reconnection scenarios in various contexts. [](https://app.coderabbit.ai/change-stack/NVIDIA/NeMo-Agent-Toolkit/pull/1935) Authors: - Yuchen Zhang (https://github.com/yczhang-nv) Approvers: - David Gardner (https://github.com/dagardner-nv) URL: #1935
* `gpt-3.5-turbo` -> `gpt-5.4-mini` * `mistralai/mixtral-8x22b-instruct-v0.1` -> `nvidia/nemotron-3-super-120b-a12b` * Remove `mistral-medium-3-instruct` ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing/index.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. ## Summary by CodeRabbit * **Documentation** * Updated example configurations in documentation and notebooks to reference current model versions. * **Chores** * Updated evaluation and framework configuration files with current model selections. * Updated test files to align with new default models. [](https://app.coderabbit.ai/change-stack/NVIDIA/NeMo-Agent-Toolkit/pull/1960?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack) Authors: - David Gardner (https://github.com/dagardner-nv) Approvers: - Bryan Bednarski (https://github.com/bbednarski9) URL: #1960
## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing/index.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. ## Summary by CodeRabbit * **Chores** * Updated internal UI library to latest version. No user-facing changes. [](https://app.coderabbit.ai/change-stack/NVIDIA/NeMo-Agent-Toolkit/pull/1963?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack) Authors: - David Gardner (https://github.com/dagardner-nv) Approvers: - Bryan Bednarski (https://github.com/bbednarski9) URL: #1963
* The previous model appeared to have problems parsing MCP output ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing/index.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. ## Summary by CodeRabbit * **Chores** * Updated the LLM model in JIRA MCP example configurations from Llama 3.1 70B Instruct to Nemotron 3 Nano 30B across both per-user and standard authentication examples. [](https://app.coderabbit.ai/change-stack/NVIDIA/NeMo-Agent-Toolkit/pull/1966?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack) Authors: - David Gardner (https://github.com/dagardner-nv) Approvers: - Yuchen Zhang (https://github.com/yczhang-nv) URL: #1966
…hain tools (#1965) Fixes the `parallel_executor` example failure seen in NAT 1.7 RC testing. This PR addresses the following issue: LangChain `StructuredTool` parsed raw string input for `ChatRequestOrMessage` as the first schema field, `messages`, causing Pydantic validation to fail because `messages` expects a list. ## Changes - Add a NAT-specific LangChain `StructuredTool` wrapper that maps raw string tool input to `{"input_message": value}` when the schema supports `input_message`. - Preserve dict-shaped inputs, including `messages`, unchanged. - Add regression coverage for: - `typing.Any` input/output handling in `ainvoke` - `typing.Any` stream output handling in `astream` - raw string and `messages` dict input through the LangChain tool wrapper ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing/index.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. ## Summary by CodeRabbit * **New Features** * Enhanced type conversion logic to properly handle `typing.Any` annotations in function invocation and streaming operations. * Improved LangChain tool wrapper with custom input handling for structured tool arguments. * **Tests** * Added validation tests for `typing.Any` type handling in function invocation and streaming. * New test coverage for LangChain tool wrapper integration. [](https://app.coderabbit.ai/change-stack/NVIDIA/NeMo-Agent-Toolkit/pull/1965?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack) Authors: - Yuchen Zhang (https://github.com/yczhang-nv) Approvers: - David Gardner (https://github.com/dagardner-nv) URL: #1965
Closes NAT-270 Fix automatic memory recall for memory-backed agent workflows run across CLI invocations. This updates the auto-memory wrapper to retrieve existing memory before storing the current user message, so recall questions do not pollute memory search before lookup. It also passes the console front end `user_id` and `conversation_id` into the runtime session, ensures direct session conversation metadata is scoped and reset correctly, and updates the Zep Cloud adapter to use query-based graph search before falling back to thread context. When no conversation ID is supplied, Zep now uses a deterministic per-user default thread instead of a single global fallback thread. The root cause was a combination of memory operation ordering and runtime scoping gaps: - recall prompts were stored before retrieval - `nat run` did not propagate the configured console user or conversation ID - Zep adapter collapsed CLI runs without conversation IDs into `default_zep_thread` while retrieval depended on thread context. Docs were updated to describe `nat run --user_id`, `nat run --conversation_id`, and the runtime scoping model for memory-backed workflows. ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing/index.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. ## Validation Automated checks run locally: ```bash uv run pytest packages/nvidia_nat_langchain/tests/agent/test_auto_memory_wrapper.py packages/nvidia_nat_core/tests/nat/runtime/test_session_manager.py packages/nvidia_nat_zep_cloud/tests/test_zep_editor.py uv run ruff check packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/auto_memory_wrapper/agent.py packages/nvidia_nat_langchain/tests/agent/test_auto_memory_wrapper.py packages/nvidia_nat_core/src/nat/front_ends/console/console_front_end_config.py packages/nvidia_nat_core/src/nat/front_ends/console/console_front_end_plugin.py packages/nvidia_nat_core/src/nat/runtime/session.py packages/nvidia_nat_core/tests/nat/runtime/test_session_manager.py packages/nvidia_nat_zep_cloud/src/nat/plugins/zep_cloud/zep_editor.py packages/nvidia_nat_zep_cloud/tests/test_zep_editor.py git diff --check ``` Provided manual validation: *Seeding* ```bash $ nat run --config_file ./workflow.yml --input "My name is Will, and I live in Pennsylvania" --input "I wanted to let you know that I am a software engineer." ``` *Retrieval* ```bash $ nat run --config_file ./workflow.yml --input "what is my name?" --input "what is my job?" --input "where do i live?" ... 2026-05-19 16:09:25 - INFO - nat.runtime.session:329 - Shared workflow built (entry_function=None) 2026-05-19 16:09:52 - INFO - nat.front_ends.console.console_front_end_plugin:160 - -------------------------------------------------- Workflow Result: Will You are a software engineer. You live in Pennsylvania. -------------------------------------------------- ``` ## Summary by CodeRabbit * **New Features** * Added CLI options --user_id and --conversation_id and a console default user_id ("nat_run_user_id") to isolate sessions and memory-backed runs. * Zep Cloud now uses deterministic per-user thread IDs when conversation_id is absent. * **Bug Fixes** * Memory retrieval now runs before capturing the current user message so prior context is injected first. * **Documentation** * Updated multi-tenant memory isolation and CLI docs for runtime user ID extraction and new options. * **Tests** * Added tests for session context cleanup and memory retrieval ordering. [](https://app.coderabbit.ai/change-stack/NVIDIA/NeMo-Agent-Toolkit/pull/1968?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack) Authors: - Will Killian (https://github.com/willkill07) - David Gardner (https://github.com/dagardner-nv) Approvers: - David Gardner (https://github.com/dagardner-nv) URL: #1968
…1970) - Require agents to read `skills/skill-evolution/SKILL.md` before creating, editing, or deciding whether to update files under `skills/` - Strengthen the `skill-evolution` trigger wording so skill maintenance tasks route there before editing the target skill ## Motivation Tests showed that agents could jump directly to editing a named target skill, such as `nat-telemetry`, without first consulting `skill-evolution`. This change makes `skill-evolution` an explicit pre-edit gate for skill maintenance tasks. ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing/index.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. ## Summary by CodeRabbit * **Documentation** * Added a “Skill Evolution” prerequisite: contributors must read the Skill Evolution guidance before creating, editing, or deciding on skill updates. * Revised the Skill Evolution guidance to frame using the resource prior to any skill changes and to streamline the workflow with recommendations for minimal, targeted edits. [](https://app.coderabbit.ai/change-stack/NVIDIA/NeMo-Agent-Toolkit/pull/1970?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack) Authors: - Yuchen Zhang (https://github.com/yczhang-nv) Approvers: - Will Killian (https://github.com/willkill07) URL: #1970
* model_health_check will now check for the deprecation header * Widen the glob pattern for yaml files to parse * When used with the `--dry-run` flag models are listed by usage count in descending order * Replace removed `nvidia/Llama-3.1-Nemotron-Nano-4B-v1.1` with `nvidia/nemotron-3-nano-30b-a3b` model in local LLM example ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing/index.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. ## Summary by CodeRabbit * **Improvements** * Health checks now detect and report deprecated models separately from down/unavailable models; deprecated models are listed and included in overall failure status but not treated as "down." * Discovery now scans both .yml and .yaml example configs. * CLI summary and JSON output include a deprecation section. * Dry-run verbose output can rank models by usage and optionally show config paths. * **Documentation** * Local LLM guides updated to use the Nemotron model in both NIM and vLLM examples. [](https://app.coderabbit.ai/change-stack/NVIDIA/NeMo-Agent-Toolkit/pull/1974?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack) Authors: - David Gardner (https://github.com/dagardner-nv) Approvers: - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) URL: #1974
- switches to Nemotron 3 model due to flakiness and time outs - revert back to `default_zep_thread` - fall back to only raising in case of non-404 response code. 404 just means not found and should not be fatal Closes ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing/index.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. ## Summary by CodeRabbit * **Configuration Updates** * Updated AI model configuration with new model variant and fine-tuned temperature settings to improve response consistency and system performance across all conversations. * **Bug Fixes** * Improved error handling for conversation search operations to gracefully manage missing context information, preventing disruptions and enhancing overall system stability and reliability. [](https://app.coderabbit.ai/change-stack/NVIDIA/NeMo-Agent-Toolkit/pull/1976?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack) Authors: - Will Killian (https://github.com/willkill07) Approvers: - David Gardner (https://github.com/dagardner-nv) URL: #1976
* Update the documentation to reflect that the `nemotron-3-nano-30b-a3b` LLM is being used locally. * Fix YAML used in documentation, along with the model name being slightly different for local NIM and vLLM usage ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing/index.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. ## Summary by CodeRabbit * **Documentation** * Clarified NIM setup guide for deployment discovery and self-hosted deployments. * Reworked vLLM section to separate LLM model selection from embedding model selection. * **Chores** * Updated example configuration files with revised LLM model references for local integrations. * Adjusted path-extraction allowlist to ignore a specific model-name token. [](https://app.coderabbit.ai/change-stack/NVIDIA/NeMo-Agent-Toolkit/pull/1979?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack) Authors: - David Gardner (https://github.com/dagardner-nv) Approvers: - Yuchen Zhang (https://github.com/yczhang-nv) URL: #1979
…ll (#1980) Restores the wire format that `/generate/full` emits for the workflow's output, which silently regressed between 1.6 and 1.7. **The contract** (documented by the eval client `nat.plugins.eval.runtime.remote_workflow.py:79` and pinned by `test_remote_evaluate.py`'s server fixture): ``` data: {"value": "<final answer>"} ``` **What broke.** PR #1851 (`feat: token streaming support for ReAct Agent`) added a `_stream_fn` to `react_agent` that yields `ChatResponseChunk` (OpenAI shape). Once a stream_fn is registered, `generate_streaming_response_full` takes the streaming branch and wraps each chunk in `ResponsePayloadOutput`, whose `get_stream_data()` dumps the chunk's full OpenAI envelope. There is no top-level `value` field, so the eval client's `chunk_data.get("value")` returns `None` and every eval scores 0. The producer (`react_agent`) and consumer (eval client) ship in the same NAT release and disagree on the wire shape. `tool_calling_agent` is exposed to the same regression for the same reason — both yield `ChatResponseChunk` from their `_stream_fn`. The fix is in the shared `ResponsePayloadOutput.get_stream_data()`, so both code paths get covered. **The fix.** `ResponsePayloadOutput.get_stream_data()` now normalizes any payload — string, primitive, `ChatResponseChunk`, `ChatResponse`, other `BaseModel` — into the canonical `data: {"value": "<str>"}\n\n` envelope. Scoped to `/generate/full`; `/v1/chat/completions` is unaffected because that path yields `ChatResponseChunk` directly through `ResponseBaseModelOutput.get_stream_data()`, never wrapped in `ResponsePayloadOutput`. WebSocket consumers do their own payload coercion in `MessageValidator.convert_data_to_message_content()` and don't call `get_stream_data()` either. **Tests.** Parametrized unit tests pin the wire format per payload type. A new integration test in `test_remote_evaluate.py` round-trips real `ResponsePayloadOutput` lines through the real `EvaluationRemoteWorkflowHandler`, so a future change that desynchronizes producer and consumer fails CI rather than silently scoring zero on every eval. ## How to verify The bug surface is two pure functions on NAT data models — the producer (`ResponsePayloadOutput.get_stream_data`) and the consumer (`chunk_data.get("value")` in the eval client). You can reproduce both the bug and the fix with no FastAPI server, no LLM, and no external services: ```python import json from nat.data_models.api_server import ResponsePayloadOutput, ChatResponseChunk # What react_agent's _stream_fn yields after PR #1851: chunk = ChatResponseChunk.create_streaming_chunk("21") # What /generate/full puts on the wire: sse_line = ResponsePayloadOutput(payload=chunk).get_stream_data() # What nat.plugins.eval.runtime.remote_workflow extracts: data = json.loads(sse_line[len("data: "):-2]) print("eval client extracts:", repr(data.get("value"))) ``` Expected output: | | Output | |---|---| | On `release/1.7` (without this PR) | `eval client extracts: None` | | With this PR applied | `eval client extracts: '21'` | The full `nvidia_nat_core` and `nvidia_nat_eval` test suites pass on this branch with no regressions, including the new parametrized unit tests and the producer/consumer integration test added here. ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing/index.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. ## Summary by CodeRabbit * **New Features** * Standardized the /generate/full SSE output to always emit responses as a consistent JSON "value" envelope for all payload types. * **Bug Fixes** * Remote evaluation now correctly accumulates streamed token/value segments into the final output instead of only capturing a single chunk. * **Tests** * Added unit and integration tests verifying the SSE envelope format and correct reconstruction of streamed responses. [](https://app.coderabbit.ai/change-stack/NVIDIA/NeMo-Agent-Toolkit/pull/1980?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack) Authors: - Matthew Grossman (https://github.com/matthewgrossman) Approvers: - Will Killian (https://github.com/willkill07) URL: #1980
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. 🗂️ Base branches to auto review (2)
Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Salonijain27
left a comment
There was a problem hiding this comment.
Approved from a dependency point of view
Signed-off-by: David Gardner <dagardner@nvidia.com>
❄️ Code freeze for
release/1.7andv1.7.0releaseWhat does this mean?
Only critical/hotfix level issues should be merged into
release/1.7until release (merging of this PR).What is the purpose of this PR?
release/1.7intomainfor the release