Skip to content

Conversation

@ericevans-nv
Copy link
Contributor

@ericevans-nv ericevans-nv commented Jan 20, 2026

Description

Adds the nvidia_nat_rag_lib plugin package that integrates the NVIDIA RAG Blueprint pipeline into NeMo Agent Toolkit, providing grounded retrieval and generation capabilities.

Note

The nvidia-rag>=2.4.0 package is scheduled for release on PyPI around February 10th, 2026.

What's Included

New Package: nvidia_nat_rag_lib

  • search tool: Retrieves grounded document excerpts from indexed sources
  • generate tool: Synthesizes answers grounded in cited source material
  • Declarative YAML configuration for RAG pipeline features:
    • Two-stage retrieval (vector search + semantic reranking)
    • Query rewriting via LLM
    • Confidence filtering
    • Structured citations

Example Configuration

  • examples/RAG/simple_rag/configs/rag_library_mode_config.yml demonstrating the plugin
  • Updated README with setup instructions, bootstrap data commands, and example output

Ingestion Script Enhancement

  • Added --drop_collection flag to scripts/langchain_web_ingest.py for easier re-ingestion with different embedding models

By Submitting this PR I confirm:

  • I am familiar with the Contributing Guidelines.
  • We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
    • Any contribution which contains commits that are not Signed-Off will not be accepted.
  • When the PR is ready for review, new or existing tests cover these changes.
  • When the PR is ready for review, the documentation is up to date with these changes.

Summary by CodeRabbit

  • New Features
    • NVIDIA RAG Library: end-to-end RAG tools (search & generate) with reranking, query rewriting, citations, confidence filtering, multi-collection search, and a ready-to-use function group.
    • CLI: options to drop an existing collection before ingest and to select embedding model.
    • Embedding config: optional embedding output-dimension setting.
  • Documentation
    • Expanded README with NVIDIA RAG Library guide, examples, and example RAG config.
  • Tests
    • Added unit and integration tests covering RAG models, tools, and integration workflows.
  • Chores
    • Added optional rag-lib dependency and packaging/license placeholders for the new RAG subpackage.

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: Eric Evans <194135482+ericevans-nv@users.noreply.github.com>
Signed-off-by: Eric Evans <194135482+ericevans-nv@users.noreply.github.com>
…roper testing

Signed-off-by: Eric Evans <194135482+ericevans-nv@users.noreply.github.com>
Signed-off-by: Eric Evans <194135482+ericevans-nv@users.noreply.github.com>
@ericevans-nv ericevans-nv self-assigned this Jan 20, 2026
@ericevans-nv ericevans-nv requested a review from a team as a code owner January 20, 2026 21:57
@ericevans-nv ericevans-nv added the DO NOT MERGE PR should not be merged; see PR for details label Jan 20, 2026
@coderabbitai
Copy link

coderabbitai bot commented Jan 20, 2026

Walkthrough

Adds a new NVIDIA RAG Library subpackage with configs, Pydantic models, an async adapter exposing search/generate tools, unit and integration tests, example config/docs, workspace/dependency registrations, a small embedder config field, and CLI flags for the ingestion script.

Changes

Cohort / File(s) Summary
New NVIDIA RAG package
packages/nvidia_nat_rag_lib/pyproject.toml, packages/nvidia_nat_rag_lib/...
Introduces the nvidia-nat-rag-lib package: project metadata, build config, dependencies (nvidia-nat~=1.5, nvidia-rag>=2.4.0), setuptools_scm versioning, UV workspace mapping, and entry point nat.plugins.rag_lib.register.
RAG plugin code
packages/nvidia_nat_rag_lib/src/nat/plugins/rag_lib/__init__.py, .../client.py, .../config.py, .../models.py, .../register.py
Adds RAGPipelineConfig and result models, plus nvidia_rag_lib async adapter that resolves LLM/embedder/retriever refs, instantiates NvidiaRAG, and exposes search and generate tools with streaming and error handling.
Package metadata & docs
packages/nvidia_nat_rag_lib/LICENSE-3rd-party.txt, packages/nvidia_nat_rag_lib/LICENSE.md, packages/nvidia_nat_rag_lib/src/nat/meta/pypi.md
Adds license reference files and PyPI-facing package description and feature notes.
Tests for rag-lib
packages/nvidia_nat_rag_lib/tests/test_models.py, packages/nvidia_nat_rag_lib/tests/test_rag_lib_function.py, packages/nvidia_nat_rag_lib/tests/test_tools.py
Adds unit tests for models and tools and extensive pytest integration tests (Milvus-backed), fixtures, parametrization, and an expected xfail case.
Examples & README
examples/RAG/simple_rag/configs/rag_library_mode_config.yml, examples/RAG/simple_rag/README.md
Adds example YAML wiring NVIDIA components and an "Advanced RAG with NVIDIA RAG Library" README section; reorganizes README structure.
Workspace & dependency updates
pyproject.toml, packages/nvidia_nat_all/pyproject.toml
Registers nvidia-nat-rag-lib as a workspace package, adds optional dependency group rag-lib, and includes the package in nvidia_nat_all dependencies and UV sources.
Embedder config change
src/nat/embedder/nim_embedder.py
Adds optional `dimensions: int
Ingestion script enhancements
scripts/langchain_web_ingest.py
Adds CLI flags --drop_collection and --embedding_model, imports MilvusClient, and optionally drops an existing Milvus collection before ingestion.
Docker compose formatting
examples/deploy/docker-compose.milvus.yml
Removes a trailing empty line (formatting-only).
Vale vocabulary
ci/vale/styles/config/vocabularies/nat/accept.txt
Adds regex entries for "Declaratively" and "Rerank/ Reranking".

Sequence Diagram(s)

sequenceDiagram
  participant Builder as Builder
  participant Resolver as ConfigResolver
  participant NvidiaRAG as NvidiaRAG
  participant VectorStore as Retriever/VectorStore
  participant FunctionGroup as FunctionGroup

  Builder->>Resolver: Build NvidiaRAGConfig from NvidiaRAGLibConfig
  Resolver->>Resolver: Resolve LLM, Embedder, Retriever refs
  Resolver->>VectorStore: Validate/configure collection_names & retriever
  Resolver-->>Builder: Return populated NvidiaRAGConfig
  Builder->>NvidiaRAG: Instantiate client with NvidiaRAGConfig
  NvidiaRAG-->>Builder: Client ready
  Builder->>FunctionGroup: Yield FunctionGroup wrapping client
  FunctionGroup->>NvidiaRAG: search(query) / generate(query)
  NvidiaRAG-->>FunctionGroup: search results / streamed deltas
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main change: integration of a RAG (Retrieval-Augmented Generation) Library Mode, which is the primary focus of this PR introducing the nvidia_nat_rag_lib plugin package.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 15

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pyproject.toml (1)

113-126: Add nvidia-nat-rag-lib to the telemetry optional dependency group.

Per coding guidelines, the telemetry group must include rag-lib. Update line 124 to include it in alphabetical order.

🔧 Suggested change
 telemetry = [
   "nvidia-nat-data-flywheel",
   "nvidia-nat-opentelemetry",
   "nvidia-nat-phoenix",
   "nvidia-nat-ragaai",
+  "nvidia-nat-rag-lib",
   "nvidia-nat-weave",
 ]
🤖 Fix all issues with AI agents
In `@examples/deploy/docker-compose.milvus.yml`:
- Around line 24-25: Update all example and docs that reference MinIO on
localhost:9000 to the new host port mapping (localhost:19000 or
http://localhost:19000) so they match docker-compose.milvus.yml's "-
"19000:9000"" mapping; specifically change endpoint_url / S3_ENDPOINT_URL
entries in examples/object_store/user_report/configs/config_s3.yml
(endpoint_url), examples/object_store/user_report/README.md (URLs),
examples/evaluation_and_profiling/simple_web_query_eval/README.md
(S3_ENDPOINT_URL), docs/source/build-workflows/object-store.md (all
localhost:9000 mentions),
docs/source/components/auth/mcp-auth/mcp-auth-token-storage.md (endpoint_url),
and docs/source/extend/custom-components/object-store.md (endpoint_url), or
alternatively add a short note in those files explaining why
docker-compose.milvus.yml maps host ports to 19000/19001 while
docker-compose.minio.yml uses 9000:9000.

In `@examples/RAG/simple_rag/configs/rag_library_mode_config.yml`:
- Around line 73-78: The generate block for rag_lib_generate incorrectly claims
reflection is used while the reflection configuration is commented out; either
re-enable the reflection settings for the generate pipeline (uncomment and wire
up the reflection configuration) or update the description field in the generate
block to remove mention of reflection so it accurately reflects current
behavior; locate the generate:_type: rag_lib_generate block and modify the
description or restore the reflection section to match the stated behavior.
- Around line 53-60: Remove the TODO/BUG comments and the commented-out
reflection block (the lines referencing reflection, enable_reflection,
max_loops, context_relevance_threshold, response_groundedness_threshold and the
BUG note about check_context_relevance()) from the example config; if you need
to preserve traceability, replace those comments with a single short
tracking-issue reference (e.g., "See issue `#1234`") rather than TODO/BUG text so
the config stays clean and guideline-compliant.

In `@packages/nvidia_nat_rag_lib/LICENSE.md`:
- Line 1: Add the standard SPDX header line "SPDX-License-Identifier:
Apache-2.0" (or the full Apache-2.0 SPDX header as required by project
guidelines) at the top of this Markdown pointer file, placing it immediately
above the existing reference line "../../LICENSE.md" so the file begins with the
SPDX identifier followed by the pointer line.

In `@packages/nvidia_nat_rag_lib/src/nat/plugins/rag_lib/client.py`:
- Line 1: Update the copyright header in client.py to match the package
convention by changing the year token "2026" to the range "2025-2026"; locate
the top-of-file SPDX comment in
packages/nvidia_nat_rag_lib/src/nat/plugins/rag_lib/client.py (the
SPDX-FileCopyrightText line) and replace the single-year value with the dashed
range used elsewhere.

In `@packages/nvidia_nat_rag_lib/src/nat/plugins/rag_lib/models.py`:
- Around line 16-43: The models import currently guards Citations behind
TYPE_CHECKING which leaves Citations undefined at runtime and breaks Pydantic v2
validation; fix by importing Citations at module runtime (remove the
TYPE_CHECKING guard and add from nvidia_rag.rag_server.response_generator import
Citations) and update the type annotations on RAGSearchResult.citations and
RAGGenerateResult.citations to use the actual Citations type (not a string) so
Pydantic can resolve the type without requiring model_rebuild().

In `@packages/nvidia_nat_rag_lib/src/nat/plugins/rag_lib/register.py`:
- Line 1: Update the SPDX copyright header line that currently reads
"SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES.
All rights reserved." to use the range "2025-2026" for consistency with
client.py; locate the SPDX header at the top of register.py (the
SPDX-FileCopyrightText line) and replace the single year with "2025-2026".

In `@packages/nvidia_nat_rag_lib/src/nat/plugins/rag_lib/tools/__init__.py`:
- Around line 1-14: Add a module-level docstring to the top of
nat.plugins.rag_lib.tools __init__.py (immediately after the existing
SPDX/license header) describing the package purpose (e.g., "RAG library tool
functions."). Preserve the existing header exactly, do not change imports or
re-exports (they are optional), and ensure the docstring is a simple
triple-quoted string placed at module scope so it satisfies the coding
guidelines.

In `@packages/nvidia_nat_rag_lib/src/nat/plugins/rag_lib/tools/search.py`:
- Around line 1-64: Add a module-level docstring describing the public purpose
of this file, then convert the existing docstrings for the RAGLibSearchConfig
class and rag_lib_search function into Google-style docstrings: for
RAGLibSearchConfig include a short summary, an Args section documenting
`rag_client`, `topic`, `description`, `collection_names`, `reranker_top_k`,
`vdb_top_k`, `confidence_threshold`, `enable_query_rewriting`,
`enable_reranker`, `enable_filter_generator`, and `filter_expr` (with
types/backticked identifiers) and any Defaults; for rag_lib_search include a
short summary and an Args section documenting `config` (`RAGLibSearchConfig`)
and `builder` (`Builder`) and a Returns section describing the return type (if
any) or `None`; update formatting to use Google style and ensure public symbols
RAGLibSearchConfig and rag_lib_search are covered.
- Around line 62-63: The public async generator rag_lib_search currently lacks
an explicit return type; update its signature to include
AsyncIterator[FunctionInfo] (or AsyncGenerator[FunctionInfo, None]) as the
return annotation on rag_lib_search(config: RAGLibSearchConfig, builder:
Builder) and add the corresponding typing import (AsyncIterator or
AsyncGenerator) plus ensure FunctionInfo is imported/available; this makes the
public API fully type-hinted while preserving the existing generator behavior.

In `@packages/nvidia_nat_rag_lib/tests/test_rag_lib_function.py`:
- Around line 569-576: Replace the TODO comment in the embedder parameterization
by explicitly marking the failing option as expected-fail instead of leaving a
TODO: add "nim_embedder_e5" back into the parametrized list as a pytest.param
with pytest.mark.xfail(reason="nv-embedqa-e5-v5 rejects dimensions param —
reference <issue-number or URL>"), or alternatively replace the TODO with a
one-line comment containing the issue reference; update the same pattern for the
other parametrizations. Target the parametrization that defines "embedder_ref"
in test_rag_lib_function.py to ensure the failing embedder is either xfailed
with a clear reason or annotated with the issue reference.
- Around line 15-31: Remove the "TODO: Add integration tests..." block from the
module docstring (the top-level docstring in this test module that contains
"KNOWN ISSUE - Embedding Model Compatibility"); move those TODO items into your
issue tracker or a dedicated design/roadmap doc and replace the removed TODO
with a brief issue reference (e.g., "See issue `#1234`") or a short note pointing
to the external tracking doc so the docstring no longer contains TODOs but still
links to the tracking location.
- Around line 506-525: The tests test_generate_method_exists,
test_search_method_exists, and test_health_method_exists use getattr with
constant names which triggers Ruff B009; change each assertion to use direct
attribute access after the hasattr check (e.g. replace
callable(getattr(NvidiaRAG, "generate")) with callable(NvidiaRAG.generate), and
similarly for NvidiaRAG.search and NvidiaRAG.health) so the tests reference the
methods directly.

In `@pyproject.toml`:
- Around line 113-114: The pyproject.toml was updated to include the optional
dependency "nvidia-nat-rag-lib" under the rag-lib extras but uv.lock was not
regenerated; run the sync to update the lockfile (e.g., execute the suggested
tooling command to install and sync the package: run uv pip install
nvidia-nat-rag-lib --sync) so uv.lock reflects the new dependency and versions,
then commit the updated uv.lock alongside the pyproject.toml change.

In `@src/nat/embedder/nim_embedder.py`:
- Line 53: Update the Pydantic Field definition for the dimensions attribute so
invalid values are rejected at validation time: add the gt=0 constraint to the
existing dimensions: int | None = Field(default=None, description="Embedding
output dimensions.") declaration in src/nat/embedder/nim_embedder.py (the
dimensions field in the Nim embedder config) so that zero or negative integers
are prohibited, matching the pattern used for top_k (e.g., gt=0).
🧹 Nitpick comments (6)
packages/nvidia_nat_rag_lib/src/nat/plugins/rag_lib/models.py (1)

1-15: Add a module docstring for the public models module.

Public modules should have Google-style docstrings. As per coding guidelines, add a concise module docstring.

🛠️ Proposed fix
 # SPDX-FileCopyrightText: Copyright (c) 2025-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 # SPDX-License-Identifier: Apache-2.0
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 # http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
+
+"""RAG result models."""
packages/nvidia_nat_rag_lib/src/nat/plugins/rag_lib/config.py (1)

46-51: Wrap nvidia_rag in backticks in the class docstring.

Docstrings should wrap code entities in backticks to avoid Vale false positives. As per coding guidelines, update the docstring.

🛠️ Proposed fix
-    """Native nvidia_rag pipeline settings.
+    """Native `nvidia_rag` pipeline settings.
packages/nvidia_nat_rag_lib/src/nat/plugins/rag_lib/tools/generate.py (1)

122-123: Silent exception swallowing may hide parsing issues.

The bare except Exception: continue silently ignores all parsing errors without any logging. While some malformed chunks are expected in streaming scenarios, silently discarding all exceptions makes debugging difficult when legitimate issues occur.

Consider logging at debug level to aid troubleshooting:

♻️ Suggested improvement
                 except Exception:
-                    continue
+                    logger.debug("Failed to parse chunk: %s", raw_chunk)
+                    continue
packages/nvidia_nat_rag_lib/src/nat/plugins/rag_lib/client.py (1)

147-169: Consider extracting the repeated propagation pattern.

The same three-field propagation logic (model_name, server_url, api_key) is repeated for query_rewriter, reflection, and filter_expression_generator. A helper function could reduce duplication and improve maintainability.

♻️ Optional refactor suggestion
def _propagate_llm_to_component(component, llm: NIMModelConfig) -> None:
    """Propagate LLM config to a component if not already set."""
    if "model_name" not in component.model_fields_set:
        component.model_name = llm.model_name
    if "server_url" not in component.model_fields_set and llm.base_url:
        component.server_url = llm.base_url
    if "api_key" not in component.model_fields_set and llm.api_key:
        component.api_key = llm.api_key

Then use:

_propagate_llm_to_component(rag_config.query_rewriter, llm)
_propagate_llm_to_component(rag_config.reflection, llm)
_propagate_llm_to_component(rag_config.filter_expression_generator, llm)
packages/nvidia_nat_rag_lib/tests/test_models.py (1)

32-35: Fixture naming convention.

Per coding guidelines, fixture functions should use the fixture_ prefix on the function name and a name argument in the decorator. However, this is a minor style consideration.

♻️ Optional fix for fixture naming
-    `@pytest.fixture`
-    def citations(self) -> Citations:
+    `@pytest.fixture`(name="citations")
+    def fixture_citations(self) -> Citations:
         """Create Citations object."""
         return Citations(total_results=2, results=[])

Also applies to: 54-57

packages/nvidia_nat_rag_lib/tests/test_tools.py (1)

37-54: Align pytest fixtures with the fixture_ naming convention.

Fixtures here don’t follow the fixture_ prefix + @pytest.fixture(name="...") pattern. Please update the fixture functions (and adjust usages) for consistency.

🔧 Suggested pattern
-    `@pytest.fixture`
-    def mock_builder(self) -> MagicMock:
+    `@pytest.fixture`(name="mock_builder")
+    def fixture_mock_builder(self) -> MagicMock:
         return MagicMock(spec=Builder)
Apply the same pattern to `tool_config` and `function_group`. As per coding guidelines, fixtures should follow the `fixture_` naming convention.

Copy link
Member

@willkill07 willkill07 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a round of comments to think about the configuration space.

Right now there are so many knobs and required interconnection for this to work. I think it can be simplified a lot without reducing any functionality.

Remember, this is RAG Library Mode for NeMo Agent Toolkit, not exposing every feature of RAG Library Mode within NeMo Agent Toolkit.

@ericevans-nv ericevans-nv changed the title Rag lib mode integration RAG Library Mode integration Jan 21, 2026
Signed-off-by: Eric Evans <194135482+ericevans-nv@users.noreply.github.com>
@ericevans-nv ericevans-nv added feature request New feature or request non-breaking Non-breaking change labels Jan 22, 2026
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
examples/RAG/simple_rag/README.md (1)

114-127: Update the CLI help snippet to include new flags.

The script now supports --drop_collection and --embedding_model, but the documented usage output omits them. As per coding guidelines, keep docs accurate and comprehensive.

✏️ Suggested update
-usage: langchain_web_ingest.py [-h] [--urls URLS] [--collection_name COLLECTION_NAME] [--milvus_uri MILVUS_URI] [--clean_cache]
+usage: langchain_web_ingest.py [-h] [--urls URLS] [--collection_name COLLECTION_NAME] [--milvus_uri MILVUS_URI] [--clean_cache] [--drop_collection] [--embedding_model EMBEDDING_MODEL]
 --clean_cache         If true, deletes local files (default: False)
+--drop_collection     Drop existing collection before ingesting (default: False)
+--embedding_model     Embedding model to use (default: nvidia/nv-embedqa-e5-v5)
🤖 Fix all issues with AI agents
In `@examples/RAG/simple_rag/README.md`:
- Around line 422-425: The fenced code block showing the pipeline flow ("Query →
Embed → Retrieve candidates → Rerank → Final results") is missing a language
specifier; update that code fence to include a language tag (e.g., text) after
the opening ``` so it becomes ```text to satisfy markdownlint and clarify the
content type.
- Around line 29-57: Update the documentation to replace the acronym "NAT" with
"NeMo Agent toolkit" in body text and use "NeMo Agent Toolkit" (capitalized
"Toolkit") in headings: change the Table of Contents entry "Integration with NAT
Components" and the corresponding heading "Integration with NAT Components" as
well as any other occurrences in the README (e.g., the advanced section
references and the two places mentioned around the end of the file) so body
sentences read "NeMo Agent toolkit" and headings read "NeMo Agent Toolkit";
ensure all TOC links and anchors match the updated heading text.

In `@packages/nvidia_nat_rag_lib/src/nat/plugins/rag_lib/client.py`:
- Around line 125-136: The exception handler in the chunk parsing loop (where
ChainResponse.model_validate_json is called and parsed.choices /
parsed.citations are processed) currently swallows errors; change the bare
except to catch Exception as e and call logger.exception("Error parsing
streaming chunk") (or similar) to record the stack trace and chunk context
before continuing; ensure the existing logger object is used (or
imported/defined if missing) and preserve the continue behavior after logging.
- Around line 74-76: The public async generator nvidia_rag_lib is missing an
explicit return type; update its signature to include the return type
AsyncGenerator[FunctionGroup, None] (i.e., change "async def
nvidia_rag_lib(config: NvidiaRAGLibConfig, builder: Builder):" to "async def
nvidia_rag_lib(config: NvidiaRAGLibConfig, builder: Builder) ->
AsyncGenerator[FunctionGroup, None]:") and add the required import for
AsyncGenerator (from typing) and FunctionGroup (where it is defined) so the
annotation resolves.
♻️ Duplicate comments (3)
packages/nvidia_nat_rag_lib/src/nat/plugins/rag_lib/register.py (1)

1-2: Align SPDX year range with the rest of the package.

Other files in this package use 2025-2026; please keep the header consistent here as well. As per coding guidelines, confirm copyright years are up to date.

🔧 Suggested fix
-# SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2025-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
pyproject.toml (1)

113-114: Sync uv.lock after adding nvidia-nat-rag-lib.

pyproject.toml now declares the optional dependency, so the lockfile should include it as well. As per coding guidelines, keep dependency declarations and uv.lock in sync.

#!/bin/bash
# Verify the lockfile includes the new optional dependency.
rg -n "nvidia-nat-rag-lib" uv.lock
packages/nvidia_nat_rag_lib/src/nat/plugins/rag_lib/client.py (1)

1-2: Align SPDX year range with the rest of the package.

Other files in this package use 2025-2026; please keep the header consistent here as well. As per coding guidelines, confirm copyright years are up to date.

🔧 Suggested fix
-# SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2025-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
🧹 Nitpick comments (1)
scripts/langchain_web_ingest.py (1)

37-53: Avoid blocking the event loop when dropping collections.

MilvusClient is synchronous, and calling it inside async def main(...) can block the loop. Consider moving the drop to a worker thread (or use an async Milvus client if available).

♻️ Possible refactor using asyncio.to_thread
     if drop_collection:
-        client = MilvusClient(uri=milvus_uri)
-        if client.has_collection(collection_name):
-            logger.info("Dropping existing collection: %s", collection_name)
-            client.drop_collection(collection_name)
-        else:
-            logger.info("Collection '%s' does not exist, nothing to drop", collection_name)
+        import asyncio
+
+        def _drop_collection() -> None:
+            client = MilvusClient(uri=milvus_uri)
+            if client.has_collection(collection_name):
+                logger.info("Dropping existing collection: %s", collection_name)
+                client.drop_collection(collection_name)
+            else:
+                logger.info("Collection '%s' does not exist, nothing to drop", collection_name)
+
+        await asyncio.to_thread(_drop_collection)

Signed-off-by: Eric Evans <194135482+ericevans-nv@users.noreply.github.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@examples/RAG/simple_rag/README.md`:
- Line 375: Update the body text string "NeMo Agent Toolkit" to "NeMo Agent
toolkit" in the README where the NVIDIA RAG Library plugin description appears
(the line containing "The NVIDIA RAG Library plugin (`nvidia_rag_lib`)
integrates the ... NeMo Agent Toolkit"); ensure only body text uses the
lowercase "toolkit" while leaving any headings unchanged.
♻️ Duplicate comments (1)
packages/nvidia_nat_rag_lib/src/nat/plugins/rag_lib/client.py (1)

117-128: Don’t silently swallow streaming chunk parse errors.

The empty except makes malformed chunks hard to debug; log the exception and keep the continue. As per coding guidelines, use logger.exception() when catching without re-raising.

🔧 Suggested fix
-                except Exception:
-                    continue
+                except Exception:
+                    logger.exception("Failed to parse streaming chunk")
+                    continue
🧹 Nitpick comments (1)
packages/nvidia_nat_rag_lib/src/nat/plugins/rag_lib/client.py (1)

1-16: Add a module-level Google-style docstring.

This is a public module but has no top-level docstring; add a concise module docstring after the SPDX header. As per coding guidelines, public modules should have Google-style docstrings.

♻️ Suggested addition
@@
 # limitations under the License.
+
+"""NVIDIA RAG Library client integration.
+
+Provides the `nvidia_rag_lib` function group and config resolution helpers for
+NVIDIA RAG Library within NeMo Agent Toolkit.
+"""

Signed-off-by: Eric Evans <194135482+ericevans-nv@users.noreply.github.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@packages/nvidia_nat_rag_lib/src/nat/plugins/rag_lib/client.py`:
- Around line 15-16: The module is missing a Google-style module docstring; add
a concise docstring at the top of
packages/nvidia_nat_rag_lib/src/nat/plugins/rag_lib/client.py immediately after
the SPDX header and before the import logging line that summarizes the module
purpose, public classes/functions, and usage examples if applicable (follow
Google Python docstring style and keep it brief).
♻️ Duplicate comments (2)
packages/nvidia_nat_rag_lib/src/nat/plugins/rag_lib/client.py (1)

117-128: Log streaming chunk parse errors instead of silently continuing.
This swallow-and-continue makes malformed chunk issues hard to diagnose.

✏️ Suggested fix
-                except Exception:
-                    continue
+                except Exception:
+                    logger.exception("Failed to parse streaming chunk")
+                    continue
Based on learnings, ...
examples/RAG/simple_rag/README.md (1)

369-376: Use lowercase "toolkit" in body text.

Line 371 uses "NeMo Agent Toolkit" (capital T) in body text. Per the coding guidelines, "Toolkit" should only be capitalized in headings; body text should use "NeMo Agent toolkit" (lowercase t).

✏️ Suggested fix
-The NVIDIA RAG Library (`nvidia_rag_lib`) integrates the [NVIDIA RAG Blueprint](https://github.com/NVIDIA-AI-Blueprints/rag) pipeline into NeMo Agent Toolkit.
+The NVIDIA RAG Library (`nvidia_rag_lib`) integrates the [NVIDIA RAG Blueprint](https://github.com/NVIDIA-AI-Blueprints/rag) pipeline into NeMo Agent toolkit.
🧹 Nitpick comments (1)
packages/nvidia_nat_rag_lib/tests/test_rag_lib_function.py (1)

511-513: Mark integration tests as slow.
These tests spin up Milvus collections and call external services; add the slow marker to keep the default suite fast.

♻️ Suggested change
-@pytest.mark.integration
+@pytest.mark.integration
+@pytest.mark.slow
 class TestNvidiaRAGIntegration:
As per coding guidelines, ...

Copy link
Contributor

@mnajafian-nv mnajafian-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work on this PR, I added some in line comments and suggestions as well as following two questions :

Q1) Can we add observability for production monitoring?
Right now you're logging errors, which is great! But in production, you'll want to track performance trends and alert on degradation. A quick win could be adding structured metrics to your search/generate functions.

Why this matters:

  • Track p50/p95/p99 latencies - "Search is suddenly taking 3 seconds instead of 500ms"
  • Monitor error rates - "We're seeing 5% failures on the generate endpoint"
  • Correlate quality metrics - "Low citation counts after the embedder upgrade"
  • Set up alerts - "Page if search latency > 2s for 5 minutes"

What you can do with these metrics:

  • Export to Prometheus/Datadog for dashboards
  • Set up PagerDuty alerts on SLO violations
  • Track before/after performance for changes
  • Debug production incidents with structured data

Q2) Can we level up the config documentation? e.g. Adding examples, constraints, and gotchas to your docstrings.

Your config fields have descriptions, but developers will have questions like:

  • "What's a good value for confidence_threshold?"
  • "What's the performance impact of enabling query rewriting?"
  • "Help, I'm getting no results, what did I misconfigure?"

ericevans-nv and others added 4 commits January 22, 2026 10:29
Signed-off-by: Eric Evans <194135482+ericevans-nv@users.noreply.github.com>
Signed-off-by: Will Killian <wkillian@nvidia.com>
Signed-off-by: Eric Evans <194135482+ericevans-nv@users.noreply.github.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@examples/RAG/simple_rag/README.md`:
- Around line 493-510: Replace possessive forms for inanimate labels in the
example console snippet: change "Agent's thoughts" to "Agent thought" or "Agent
notes", and change "Tool's input" / "Tool's response" to "Tool input" / "Tool
response" (or "Tool request" / "Tool response") in the block showing the
cuda_search__search interaction and related lines so the documentation avoids
possessive ’s for inanimate objects.

In `@packages/nvidia_nat_rag_lib/src/nat/plugins/rag_lib/client.py`:
- Line 96: Update the ValueError message that currently reads 'base_url is
required for LLM config specified in in NVIDIA RAG Config.' to remove the
duplicated "in" so it reads 'base_url is required for LLM config specified in
NVIDIA RAG Config.'; locate the raise ValueError(...) statement in client.py
(the raise ValueError call shown) and replace the string accordingly.
- Line 113: Fix the duplicated word "in" in the ValueError message in client.py
by updating the error string raised (the raise ValueError(...) statement) to
read "base_url is required for embedder config specified in NVIDIA RAG Config."
so it matches the corrected phrasing used elsewhere (e.g., similar message at
line 96).
♻️ Duplicate comments (2)
examples/RAG/simple_rag/README.md (1)

372-375: Use lowercase “toolkit” in body text and follow first‑use naming.

Body text should use “NeMo Agent toolkit” (lowercase t). If this is the first mention in this README, it should be “NVIDIA NeMo Agent toolkit.” As written, it uses “NeMo Agent Toolkit” in body text. Based on learnings, use “toolkit” in body text and reserve “Toolkit” for headings. Based on learnings, avoid the NAT acronym and follow the toolkit naming conventions in documentation.

✏️ Suggested edit
-The NVIDIA RAG Library (`nvidia_rag_lib`) integrates the [NVIDIA RAG Blueprint](https://github.com/NVIDIA-AI-Blueprints/rag) pipeline into NeMo Agent Toolkit.
+The NVIDIA RAG Library (`nvidia_rag_lib`) integrates the [NVIDIA RAG Blueprint](https://github.com/NVIDIA-AI-Blueprints/rag) pipeline into NeMo Agent toolkit.
packages/nvidia_nat_rag_lib/src/nat/plugins/rag_lib/client.py (1)

15-16: Add a module docstring.

Public modules should include a concise Google-style docstring after the SPDX header. This was flagged in a previous review but appears unaddressed.

✏️ Suggested fix
 # limitations under the License.
 
+"""RAG library client integration for NeMo Agent Toolkit."""
+
 import logging
🧹 Nitpick comments (3)
packages/nvidia_nat_rag_lib/tests/test_rag_lib_function.py (1)

198-205: Remove unused mock_builder and retriever_ref parameters.

These parameters are injected but never used in test_search, test_generate, and test_health. The tests create NvidiaRAGConfig directly from the config dictionaries rather than resolving through mock_builder, and retriever_ref is parametrized but the test hardcodes "http://localhost:19530".

Either use these parameters to exercise the builder resolution path, or remove them to reduce test complexity and eliminate false parametrization overhead.

♻️ Option 1: Remove unused parameters (simpler)
-    `@pytest.mark.parametrize`("retriever_ref", list(RETRIEVER_CONFIGS.keys()))
     async def test_search(
         self,
-        mock_builder: MagicMock,
         create_collection,
         llm_ref: str,
         embedder_ref: str,
-        retriever_ref: str,
     ) -> None:

Apply similar changes to test_generate and test_health.

packages/nvidia_nat_rag_lib/src/nat/plugins/rag_lib/client.py (2)

156-158: Use logger.error() instead of logger.exception() when re-raising.

Per coding guidelines, when re-raising exceptions with bare raise, use logger.error() to avoid duplicate stack trace output (the caller will also log the exception).

✏️ Suggested fix
         except Exception:
-            logger.exception("RAG search failed")
+            logger.error("RAG search failed")
             raise

195-197: Use logger.error() instead of logger.exception() when re-raising.

Same issue as the search function - per coding guidelines, use logger.error() to avoid duplicate stack traces when re-raising.

✏️ Suggested fix
         except Exception:
-            logger.exception("RAG generate failed")
+            logger.error("RAG generate failed")
             raise

if parsed.citations and parsed.citations.results:
final_citations = parsed.citations
except (ValueError, TypeError, KeyError) as e:
logger.debug("Failed to parse RAG response chunk: %s - %s", type(e).__name__, e)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we fail here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Individual chunk parse failures shouldn't abort the entire stream. The outer try/except handles actual operation failures, this inner handler gracefully skips malformed chunks while accumulating successful ones.

Signed-off-by: Eric Evans <194135482+ericevans-nv@users.noreply.github.com>
Signed-off-by: Eric Evans <194135482+ericevans-nv@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

DO NOT MERGE PR should not be merged; see PR for details feature request New feature or request non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants