-
Notifications
You must be signed in to change notification settings - Fork 492
RAG Library Mode integration #1440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
RAG Library Mode integration #1440
Conversation
Signed-off-by: Eric Evans <194135482+ericevans-nv@users.noreply.github.com>
…ag-lib-mode-integration
…ag-lib-mode-integration
…ag-lib-mode-integration
…ag-lib-mode-integration
…ag-lib-mode-integration
…ag-lib-mode-integration
…ag-lib-mode-integration
…ag-lib-mode-integration
Signed-off-by: Eric Evans <194135482+ericevans-nv@users.noreply.github.com>
…ag-lib-mode-integration
…ag-lib-mode-integration
…roper testing Signed-off-by: Eric Evans <194135482+ericevans-nv@users.noreply.github.com>
Signed-off-by: Eric Evans <194135482+ericevans-nv@users.noreply.github.com>
WalkthroughAdds a new NVIDIA RAG Library subpackage with configs, Pydantic models, an async adapter exposing search/generate tools, unit and integration tests, example config/docs, workspace/dependency registrations, a small embedder config field, and CLI flags for the ingestion script. Changes
Sequence Diagram(s)sequenceDiagram
participant Builder as Builder
participant Resolver as ConfigResolver
participant NvidiaRAG as NvidiaRAG
participant VectorStore as Retriever/VectorStore
participant FunctionGroup as FunctionGroup
Builder->>Resolver: Build NvidiaRAGConfig from NvidiaRAGLibConfig
Resolver->>Resolver: Resolve LLM, Embedder, Retriever refs
Resolver->>VectorStore: Validate/configure collection_names & retriever
Resolver-->>Builder: Return populated NvidiaRAGConfig
Builder->>NvidiaRAG: Instantiate client with NvidiaRAGConfig
NvidiaRAG-->>Builder: Client ready
Builder->>FunctionGroup: Yield FunctionGroup wrapping client
FunctionGroup->>NvidiaRAG: search(query) / generate(query)
NvidiaRAG-->>FunctionGroup: search results / streamed deltas
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 15
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pyproject.toml (1)
113-126: Addnvidia-nat-rag-libto thetelemetryoptional dependency group.Per coding guidelines, the telemetry group must include
rag-lib. Update line 124 to include it in alphabetical order.🔧 Suggested change
telemetry = [ "nvidia-nat-data-flywheel", "nvidia-nat-opentelemetry", "nvidia-nat-phoenix", "nvidia-nat-ragaai", + "nvidia-nat-rag-lib", "nvidia-nat-weave", ]
🤖 Fix all issues with AI agents
In `@examples/deploy/docker-compose.milvus.yml`:
- Around line 24-25: Update all example and docs that reference MinIO on
localhost:9000 to the new host port mapping (localhost:19000 or
http://localhost:19000) so they match docker-compose.milvus.yml's "-
"19000:9000"" mapping; specifically change endpoint_url / S3_ENDPOINT_URL
entries in examples/object_store/user_report/configs/config_s3.yml
(endpoint_url), examples/object_store/user_report/README.md (URLs),
examples/evaluation_and_profiling/simple_web_query_eval/README.md
(S3_ENDPOINT_URL), docs/source/build-workflows/object-store.md (all
localhost:9000 mentions),
docs/source/components/auth/mcp-auth/mcp-auth-token-storage.md (endpoint_url),
and docs/source/extend/custom-components/object-store.md (endpoint_url), or
alternatively add a short note in those files explaining why
docker-compose.milvus.yml maps host ports to 19000/19001 while
docker-compose.minio.yml uses 9000:9000.
In `@examples/RAG/simple_rag/configs/rag_library_mode_config.yml`:
- Around line 73-78: The generate block for rag_lib_generate incorrectly claims
reflection is used while the reflection configuration is commented out; either
re-enable the reflection settings for the generate pipeline (uncomment and wire
up the reflection configuration) or update the description field in the generate
block to remove mention of reflection so it accurately reflects current
behavior; locate the generate:_type: rag_lib_generate block and modify the
description or restore the reflection section to match the stated behavior.
- Around line 53-60: Remove the TODO/BUG comments and the commented-out
reflection block (the lines referencing reflection, enable_reflection,
max_loops, context_relevance_threshold, response_groundedness_threshold and the
BUG note about check_context_relevance()) from the example config; if you need
to preserve traceability, replace those comments with a single short
tracking-issue reference (e.g., "See issue `#1234`") rather than TODO/BUG text so
the config stays clean and guideline-compliant.
In `@packages/nvidia_nat_rag_lib/LICENSE.md`:
- Line 1: Add the standard SPDX header line "SPDX-License-Identifier:
Apache-2.0" (or the full Apache-2.0 SPDX header as required by project
guidelines) at the top of this Markdown pointer file, placing it immediately
above the existing reference line "../../LICENSE.md" so the file begins with the
SPDX identifier followed by the pointer line.
In `@packages/nvidia_nat_rag_lib/src/nat/plugins/rag_lib/client.py`:
- Line 1: Update the copyright header in client.py to match the package
convention by changing the year token "2026" to the range "2025-2026"; locate
the top-of-file SPDX comment in
packages/nvidia_nat_rag_lib/src/nat/plugins/rag_lib/client.py (the
SPDX-FileCopyrightText line) and replace the single-year value with the dashed
range used elsewhere.
In `@packages/nvidia_nat_rag_lib/src/nat/plugins/rag_lib/models.py`:
- Around line 16-43: The models import currently guards Citations behind
TYPE_CHECKING which leaves Citations undefined at runtime and breaks Pydantic v2
validation; fix by importing Citations at module runtime (remove the
TYPE_CHECKING guard and add from nvidia_rag.rag_server.response_generator import
Citations) and update the type annotations on RAGSearchResult.citations and
RAGGenerateResult.citations to use the actual Citations type (not a string) so
Pydantic can resolve the type without requiring model_rebuild().
In `@packages/nvidia_nat_rag_lib/src/nat/plugins/rag_lib/register.py`:
- Line 1: Update the SPDX copyright header line that currently reads
"SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES.
All rights reserved." to use the range "2025-2026" for consistency with
client.py; locate the SPDX header at the top of register.py (the
SPDX-FileCopyrightText line) and replace the single year with "2025-2026".
In `@packages/nvidia_nat_rag_lib/src/nat/plugins/rag_lib/tools/__init__.py`:
- Around line 1-14: Add a module-level docstring to the top of
nat.plugins.rag_lib.tools __init__.py (immediately after the existing
SPDX/license header) describing the package purpose (e.g., "RAG library tool
functions."). Preserve the existing header exactly, do not change imports or
re-exports (they are optional), and ensure the docstring is a simple
triple-quoted string placed at module scope so it satisfies the coding
guidelines.
In `@packages/nvidia_nat_rag_lib/src/nat/plugins/rag_lib/tools/search.py`:
- Around line 1-64: Add a module-level docstring describing the public purpose
of this file, then convert the existing docstrings for the RAGLibSearchConfig
class and rag_lib_search function into Google-style docstrings: for
RAGLibSearchConfig include a short summary, an Args section documenting
`rag_client`, `topic`, `description`, `collection_names`, `reranker_top_k`,
`vdb_top_k`, `confidence_threshold`, `enable_query_rewriting`,
`enable_reranker`, `enable_filter_generator`, and `filter_expr` (with
types/backticked identifiers) and any Defaults; for rag_lib_search include a
short summary and an Args section documenting `config` (`RAGLibSearchConfig`)
and `builder` (`Builder`) and a Returns section describing the return type (if
any) or `None`; update formatting to use Google style and ensure public symbols
RAGLibSearchConfig and rag_lib_search are covered.
- Around line 62-63: The public async generator rag_lib_search currently lacks
an explicit return type; update its signature to include
AsyncIterator[FunctionInfo] (or AsyncGenerator[FunctionInfo, None]) as the
return annotation on rag_lib_search(config: RAGLibSearchConfig, builder:
Builder) and add the corresponding typing import (AsyncIterator or
AsyncGenerator) plus ensure FunctionInfo is imported/available; this makes the
public API fully type-hinted while preserving the existing generator behavior.
In `@packages/nvidia_nat_rag_lib/tests/test_rag_lib_function.py`:
- Around line 569-576: Replace the TODO comment in the embedder parameterization
by explicitly marking the failing option as expected-fail instead of leaving a
TODO: add "nim_embedder_e5" back into the parametrized list as a pytest.param
with pytest.mark.xfail(reason="nv-embedqa-e5-v5 rejects dimensions param —
reference <issue-number or URL>"), or alternatively replace the TODO with a
one-line comment containing the issue reference; update the same pattern for the
other parametrizations. Target the parametrization that defines "embedder_ref"
in test_rag_lib_function.py to ensure the failing embedder is either xfailed
with a clear reason or annotated with the issue reference.
- Around line 15-31: Remove the "TODO: Add integration tests..." block from the
module docstring (the top-level docstring in this test module that contains
"KNOWN ISSUE - Embedding Model Compatibility"); move those TODO items into your
issue tracker or a dedicated design/roadmap doc and replace the removed TODO
with a brief issue reference (e.g., "See issue `#1234`") or a short note pointing
to the external tracking doc so the docstring no longer contains TODOs but still
links to the tracking location.
- Around line 506-525: The tests test_generate_method_exists,
test_search_method_exists, and test_health_method_exists use getattr with
constant names which triggers Ruff B009; change each assertion to use direct
attribute access after the hasattr check (e.g. replace
callable(getattr(NvidiaRAG, "generate")) with callable(NvidiaRAG.generate), and
similarly for NvidiaRAG.search and NvidiaRAG.health) so the tests reference the
methods directly.
In `@pyproject.toml`:
- Around line 113-114: The pyproject.toml was updated to include the optional
dependency "nvidia-nat-rag-lib" under the rag-lib extras but uv.lock was not
regenerated; run the sync to update the lockfile (e.g., execute the suggested
tooling command to install and sync the package: run uv pip install
nvidia-nat-rag-lib --sync) so uv.lock reflects the new dependency and versions,
then commit the updated uv.lock alongside the pyproject.toml change.
In `@src/nat/embedder/nim_embedder.py`:
- Line 53: Update the Pydantic Field definition for the dimensions attribute so
invalid values are rejected at validation time: add the gt=0 constraint to the
existing dimensions: int | None = Field(default=None, description="Embedding
output dimensions.") declaration in src/nat/embedder/nim_embedder.py (the
dimensions field in the Nim embedder config) so that zero or negative integers
are prohibited, matching the pattern used for top_k (e.g., gt=0).
🧹 Nitpick comments (6)
packages/nvidia_nat_rag_lib/src/nat/plugins/rag_lib/models.py (1)
1-15: Add a module docstring for the public models module.Public modules should have Google-style docstrings. As per coding guidelines, add a concise module docstring.
🛠️ Proposed fix
# SPDX-FileCopyrightText: Copyright (c) 2025-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved. # SPDX-License-Identifier: Apache-2.0 # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. + +"""RAG result models."""packages/nvidia_nat_rag_lib/src/nat/plugins/rag_lib/config.py (1)
46-51: Wrapnvidia_ragin backticks in the class docstring.Docstrings should wrap code entities in backticks to avoid Vale false positives. As per coding guidelines, update the docstring.
🛠️ Proposed fix
- """Native nvidia_rag pipeline settings. + """Native `nvidia_rag` pipeline settings.packages/nvidia_nat_rag_lib/src/nat/plugins/rag_lib/tools/generate.py (1)
122-123: Silent exception swallowing may hide parsing issues.The bare
except Exception: continuesilently ignores all parsing errors without any logging. While some malformed chunks are expected in streaming scenarios, silently discarding all exceptions makes debugging difficult when legitimate issues occur.Consider logging at debug level to aid troubleshooting:
♻️ Suggested improvement
except Exception: - continue + logger.debug("Failed to parse chunk: %s", raw_chunk) + continuepackages/nvidia_nat_rag_lib/src/nat/plugins/rag_lib/client.py (1)
147-169: Consider extracting the repeated propagation pattern.The same three-field propagation logic (
model_name,server_url,api_key) is repeated forquery_rewriter,reflection, andfilter_expression_generator. A helper function could reduce duplication and improve maintainability.♻️ Optional refactor suggestion
def _propagate_llm_to_component(component, llm: NIMModelConfig) -> None: """Propagate LLM config to a component if not already set.""" if "model_name" not in component.model_fields_set: component.model_name = llm.model_name if "server_url" not in component.model_fields_set and llm.base_url: component.server_url = llm.base_url if "api_key" not in component.model_fields_set and llm.api_key: component.api_key = llm.api_keyThen use:
_propagate_llm_to_component(rag_config.query_rewriter, llm) _propagate_llm_to_component(rag_config.reflection, llm) _propagate_llm_to_component(rag_config.filter_expression_generator, llm)packages/nvidia_nat_rag_lib/tests/test_models.py (1)
32-35: Fixture naming convention.Per coding guidelines, fixture functions should use the
fixture_prefix on the function name and anameargument in the decorator. However, this is a minor style consideration.♻️ Optional fix for fixture naming
- `@pytest.fixture` - def citations(self) -> Citations: + `@pytest.fixture`(name="citations") + def fixture_citations(self) -> Citations: """Create Citations object.""" return Citations(total_results=2, results=[])Also applies to: 54-57
packages/nvidia_nat_rag_lib/tests/test_tools.py (1)
37-54: Align pytest fixtures with thefixture_naming convention.Fixtures here don’t follow the
fixture_prefix +@pytest.fixture(name="...")pattern. Please update the fixture functions (and adjust usages) for consistency.Apply the same pattern to `tool_config` and `function_group`. As per coding guidelines, fixtures should follow the `fixture_` naming convention.🔧 Suggested pattern
- `@pytest.fixture` - def mock_builder(self) -> MagicMock: + `@pytest.fixture`(name="mock_builder") + def fixture_mock_builder(self) -> MagicMock: return MagicMock(spec=Builder)
willkill07
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a round of comments to think about the configuration space.
Right now there are so many knobs and required interconnection for this to work. I think it can be simplified a lot without reducing any functionality.
Remember, this is RAG Library Mode for NeMo Agent Toolkit, not exposing every feature of RAG Library Mode within NeMo Agent Toolkit.
packages/nvidia_nat_rag_lib/src/nat/plugins/rag_lib/tools/generate.py
Outdated
Show resolved
Hide resolved
packages/nvidia_nat_rag_lib/src/nat/plugins/rag_lib/tools/generate.py
Outdated
Show resolved
Hide resolved
packages/nvidia_nat_rag_lib/src/nat/plugins/rag_lib/tools/search.py
Outdated
Show resolved
Hide resolved
…ag-lib-mode-integration
Signed-off-by: Eric Evans <194135482+ericevans-nv@users.noreply.github.com>
…ag-lib-mode-integration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
examples/RAG/simple_rag/README.md (1)
114-127: Update the CLI help snippet to include new flags.The script now supports
--drop_collectionand--embedding_model, but the documented usage output omits them. As per coding guidelines, keep docs accurate and comprehensive.✏️ Suggested update
-usage: langchain_web_ingest.py [-h] [--urls URLS] [--collection_name COLLECTION_NAME] [--milvus_uri MILVUS_URI] [--clean_cache] +usage: langchain_web_ingest.py [-h] [--urls URLS] [--collection_name COLLECTION_NAME] [--milvus_uri MILVUS_URI] [--clean_cache] [--drop_collection] [--embedding_model EMBEDDING_MODEL]--clean_cache If true, deletes local files (default: False) +--drop_collection Drop existing collection before ingesting (default: False) +--embedding_model Embedding model to use (default: nvidia/nv-embedqa-e5-v5)
🤖 Fix all issues with AI agents
In `@examples/RAG/simple_rag/README.md`:
- Around line 422-425: The fenced code block showing the pipeline flow ("Query →
Embed → Retrieve candidates → Rerank → Final results") is missing a language
specifier; update that code fence to include a language tag (e.g., text) after
the opening ``` so it becomes ```text to satisfy markdownlint and clarify the
content type.
- Around line 29-57: Update the documentation to replace the acronym "NAT" with
"NeMo Agent toolkit" in body text and use "NeMo Agent Toolkit" (capitalized
"Toolkit") in headings: change the Table of Contents entry "Integration with NAT
Components" and the corresponding heading "Integration with NAT Components" as
well as any other occurrences in the README (e.g., the advanced section
references and the two places mentioned around the end of the file) so body
sentences read "NeMo Agent toolkit" and headings read "NeMo Agent Toolkit";
ensure all TOC links and anchors match the updated heading text.
In `@packages/nvidia_nat_rag_lib/src/nat/plugins/rag_lib/client.py`:
- Around line 125-136: The exception handler in the chunk parsing loop (where
ChainResponse.model_validate_json is called and parsed.choices /
parsed.citations are processed) currently swallows errors; change the bare
except to catch Exception as e and call logger.exception("Error parsing
streaming chunk") (or similar) to record the stack trace and chunk context
before continuing; ensure the existing logger object is used (or
imported/defined if missing) and preserve the continue behavior after logging.
- Around line 74-76: The public async generator nvidia_rag_lib is missing an
explicit return type; update its signature to include the return type
AsyncGenerator[FunctionGroup, None] (i.e., change "async def
nvidia_rag_lib(config: NvidiaRAGLibConfig, builder: Builder):" to "async def
nvidia_rag_lib(config: NvidiaRAGLibConfig, builder: Builder) ->
AsyncGenerator[FunctionGroup, None]:") and add the required import for
AsyncGenerator (from typing) and FunctionGroup (where it is defined) so the
annotation resolves.
♻️ Duplicate comments (3)
packages/nvidia_nat_rag_lib/src/nat/plugins/rag_lib/register.py (1)
1-2: Align SPDX year range with the rest of the package.Other files in this package use
2025-2026; please keep the header consistent here as well. As per coding guidelines, confirm copyright years are up to date.🔧 Suggested fix
-# SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-FileCopyrightText: Copyright (c) 2025-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.pyproject.toml (1)
113-114: Syncuv.lockafter addingnvidia-nat-rag-lib.
pyproject.tomlnow declares the optional dependency, so the lockfile should include it as well. As per coding guidelines, keep dependency declarations anduv.lockin sync.#!/bin/bash # Verify the lockfile includes the new optional dependency. rg -n "nvidia-nat-rag-lib" uv.lockpackages/nvidia_nat_rag_lib/src/nat/plugins/rag_lib/client.py (1)
1-2: Align SPDX year range with the rest of the package.Other files in this package use
2025-2026; please keep the header consistent here as well. As per coding guidelines, confirm copyright years are up to date.🔧 Suggested fix
-# SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-FileCopyrightText: Copyright (c) 2025-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
🧹 Nitpick comments (1)
scripts/langchain_web_ingest.py (1)
37-53: Avoid blocking the event loop when dropping collections.
MilvusClientis synchronous, and calling it insideasync def main(...)can block the loop. Consider moving the drop to a worker thread (or use an async Milvus client if available).♻️ Possible refactor using
asyncio.to_threadif drop_collection: - client = MilvusClient(uri=milvus_uri) - if client.has_collection(collection_name): - logger.info("Dropping existing collection: %s", collection_name) - client.drop_collection(collection_name) - else: - logger.info("Collection '%s' does not exist, nothing to drop", collection_name) + import asyncio + + def _drop_collection() -> None: + client = MilvusClient(uri=milvus_uri) + if client.has_collection(collection_name): + logger.info("Dropping existing collection: %s", collection_name) + client.drop_collection(collection_name) + else: + logger.info("Collection '%s' does not exist, nothing to drop", collection_name) + + await asyncio.to_thread(_drop_collection)
Signed-off-by: Eric Evans <194135482+ericevans-nv@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@examples/RAG/simple_rag/README.md`:
- Line 375: Update the body text string "NeMo Agent Toolkit" to "NeMo Agent
toolkit" in the README where the NVIDIA RAG Library plugin description appears
(the line containing "The NVIDIA RAG Library plugin (`nvidia_rag_lib`)
integrates the ... NeMo Agent Toolkit"); ensure only body text uses the
lowercase "toolkit" while leaving any headings unchanged.
♻️ Duplicate comments (1)
packages/nvidia_nat_rag_lib/src/nat/plugins/rag_lib/client.py (1)
117-128: Don’t silently swallow streaming chunk parse errors.The empty
exceptmakes malformed chunks hard to debug; log the exception and keep thecontinue. As per coding guidelines, uselogger.exception()when catching without re-raising.🔧 Suggested fix
- except Exception: - continue + except Exception: + logger.exception("Failed to parse streaming chunk") + continue
🧹 Nitpick comments (1)
packages/nvidia_nat_rag_lib/src/nat/plugins/rag_lib/client.py (1)
1-16: Add a module-level Google-style docstring.This is a public module but has no top-level docstring; add a concise module docstring after the SPDX header. As per coding guidelines, public modules should have Google-style docstrings.
♻️ Suggested addition
@@ # limitations under the License. + +"""NVIDIA RAG Library client integration. + +Provides the `nvidia_rag_lib` function group and config resolution helpers for +NVIDIA RAG Library within NeMo Agent Toolkit. +"""
Signed-off-by: Eric Evans <194135482+ericevans-nv@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@packages/nvidia_nat_rag_lib/src/nat/plugins/rag_lib/client.py`:
- Around line 15-16: The module is missing a Google-style module docstring; add
a concise docstring at the top of
packages/nvidia_nat_rag_lib/src/nat/plugins/rag_lib/client.py immediately after
the SPDX header and before the import logging line that summarizes the module
purpose, public classes/functions, and usage examples if applicable (follow
Google Python docstring style and keep it brief).
♻️ Duplicate comments (2)
packages/nvidia_nat_rag_lib/src/nat/plugins/rag_lib/client.py (1)
117-128: Log streaming chunk parse errors instead of silently continuing.
This swallow-and-continue makes malformed chunk issues hard to diagnose.Based on learnings, ...✏️ Suggested fix
- except Exception: - continue + except Exception: + logger.exception("Failed to parse streaming chunk") + continueexamples/RAG/simple_rag/README.md (1)
369-376: Use lowercase "toolkit" in body text.Line 371 uses "NeMo Agent Toolkit" (capital T) in body text. Per the coding guidelines, "Toolkit" should only be capitalized in headings; body text should use "NeMo Agent toolkit" (lowercase t).
✏️ Suggested fix
-The NVIDIA RAG Library (`nvidia_rag_lib`) integrates the [NVIDIA RAG Blueprint](https://github.com/NVIDIA-AI-Blueprints/rag) pipeline into NeMo Agent Toolkit. +The NVIDIA RAG Library (`nvidia_rag_lib`) integrates the [NVIDIA RAG Blueprint](https://github.com/NVIDIA-AI-Blueprints/rag) pipeline into NeMo Agent toolkit.
🧹 Nitpick comments (1)
packages/nvidia_nat_rag_lib/tests/test_rag_lib_function.py (1)
511-513: Mark integration tests as slow.
These tests spin up Milvus collections and call external services; add the slow marker to keep the default suite fast.As per coding guidelines, ...♻️ Suggested change
-@pytest.mark.integration +@pytest.mark.integration +@pytest.mark.slow class TestNvidiaRAGIntegration:
mnajafian-nv
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent work on this PR, I added some in line comments and suggestions as well as following two questions :
Q1) Can we add observability for production monitoring?
Right now you're logging errors, which is great! But in production, you'll want to track performance trends and alert on degradation. A quick win could be adding structured metrics to your search/generate functions.
Why this matters:
- Track p50/p95/p99 latencies - "Search is suddenly taking 3 seconds instead of 500ms"
- Monitor error rates - "We're seeing 5% failures on the generate endpoint"
- Correlate quality metrics - "Low citation counts after the embedder upgrade"
- Set up alerts - "Page if search latency > 2s for 5 minutes"
What you can do with these metrics:
- Export to Prometheus/Datadog for dashboards
- Set up PagerDuty alerts on SLO violations
- Track before/after performance for changes
- Debug production incidents with structured data
Q2) Can we level up the config documentation? e.g. Adding examples, constraints, and gotchas to your docstrings.
Your config fields have descriptions, but developers will have questions like:
- "What's a good value for confidence_threshold?"
- "What's the performance impact of enabling query rewriting?"
- "Help, I'm getting no results, what did I misconfigure?"
Signed-off-by: Eric Evans <194135482+ericevans-nv@users.noreply.github.com>
Signed-off-by: Will Killian <wkillian@nvidia.com>
Signed-off-by: Eric Evans <194135482+ericevans-nv@users.noreply.github.com>
…ag-lib-mode-integration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@examples/RAG/simple_rag/README.md`:
- Around line 493-510: Replace possessive forms for inanimate labels in the
example console snippet: change "Agent's thoughts" to "Agent thought" or "Agent
notes", and change "Tool's input" / "Tool's response" to "Tool input" / "Tool
response" (or "Tool request" / "Tool response") in the block showing the
cuda_search__search interaction and related lines so the documentation avoids
possessive ’s for inanimate objects.
In `@packages/nvidia_nat_rag_lib/src/nat/plugins/rag_lib/client.py`:
- Line 96: Update the ValueError message that currently reads 'base_url is
required for LLM config specified in in NVIDIA RAG Config.' to remove the
duplicated "in" so it reads 'base_url is required for LLM config specified in
NVIDIA RAG Config.'; locate the raise ValueError(...) statement in client.py
(the raise ValueError call shown) and replace the string accordingly.
- Line 113: Fix the duplicated word "in" in the ValueError message in client.py
by updating the error string raised (the raise ValueError(...) statement) to
read "base_url is required for embedder config specified in NVIDIA RAG Config."
so it matches the corrected phrasing used elsewhere (e.g., similar message at
line 96).
♻️ Duplicate comments (2)
examples/RAG/simple_rag/README.md (1)
372-375: Use lowercase “toolkit” in body text and follow first‑use naming.Body text should use “NeMo Agent toolkit” (lowercase t). If this is the first mention in this README, it should be “NVIDIA NeMo Agent toolkit.” As written, it uses “NeMo Agent Toolkit” in body text. Based on learnings, use “toolkit” in body text and reserve “Toolkit” for headings. Based on learnings, avoid the NAT acronym and follow the toolkit naming conventions in documentation.
✏️ Suggested edit
-The NVIDIA RAG Library (`nvidia_rag_lib`) integrates the [NVIDIA RAG Blueprint](https://github.com/NVIDIA-AI-Blueprints/rag) pipeline into NeMo Agent Toolkit. +The NVIDIA RAG Library (`nvidia_rag_lib`) integrates the [NVIDIA RAG Blueprint](https://github.com/NVIDIA-AI-Blueprints/rag) pipeline into NeMo Agent toolkit.packages/nvidia_nat_rag_lib/src/nat/plugins/rag_lib/client.py (1)
15-16: Add a module docstring.Public modules should include a concise Google-style docstring after the SPDX header. This was flagged in a previous review but appears unaddressed.
✏️ Suggested fix
# limitations under the License. +"""RAG library client integration for NeMo Agent Toolkit.""" + import logging
🧹 Nitpick comments (3)
packages/nvidia_nat_rag_lib/tests/test_rag_lib_function.py (1)
198-205: Remove unusedmock_builderandretriever_refparameters.These parameters are injected but never used in
test_search,test_generate, andtest_health. The tests createNvidiaRAGConfigdirectly from the config dictionaries rather than resolving throughmock_builder, andretriever_refis parametrized but the test hardcodes"http://localhost:19530".Either use these parameters to exercise the builder resolution path, or remove them to reduce test complexity and eliminate false parametrization overhead.
♻️ Option 1: Remove unused parameters (simpler)
- `@pytest.mark.parametrize`("retriever_ref", list(RETRIEVER_CONFIGS.keys())) async def test_search( self, - mock_builder: MagicMock, create_collection, llm_ref: str, embedder_ref: str, - retriever_ref: str, ) -> None:Apply similar changes to
test_generateandtest_health.packages/nvidia_nat_rag_lib/src/nat/plugins/rag_lib/client.py (2)
156-158: Uselogger.error()instead oflogger.exception()when re-raising.Per coding guidelines, when re-raising exceptions with bare
raise, uselogger.error()to avoid duplicate stack trace output (the caller will also log the exception).✏️ Suggested fix
except Exception: - logger.exception("RAG search failed") + logger.error("RAG search failed") raise
195-197: Uselogger.error()instead oflogger.exception()when re-raising.Same issue as the
searchfunction - per coding guidelines, uselogger.error()to avoid duplicate stack traces when re-raising.✏️ Suggested fix
except Exception: - logger.exception("RAG generate failed") + logger.error("RAG generate failed") raise
| if parsed.citations and parsed.citations.results: | ||
| final_citations = parsed.citations | ||
| except (ValueError, TypeError, KeyError) as e: | ||
| logger.debug("Failed to parse RAG response chunk: %s - %s", type(e).__name__, e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we fail here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. Individual chunk parse failures shouldn't abort the entire stream. The outer try/except handles actual operation failures, this inner handler gracefully skips malformed chunks while accumulating successful ones.
Signed-off-by: Eric Evans <194135482+ericevans-nv@users.noreply.github.com>
…ag-lib-mode-integration
Signed-off-by: Eric Evans <194135482+ericevans-nv@users.noreply.github.com>
Description
Adds the
nvidia_nat_rag_libplugin package that integrates the NVIDIA RAG Blueprint pipeline into NeMo Agent Toolkit, providing grounded retrieval and generation capabilities.Note
The
nvidia-rag>=2.4.0package is scheduled for release on PyPI around February 10th, 2026.What's Included
New Package:
nvidia_nat_rag_libsearchtool: Retrieves grounded document excerpts from indexed sourcesgeneratetool: Synthesizes answers grounded in cited source materialExample Configuration
examples/RAG/simple_rag/configs/rag_library_mode_config.ymldemonstrating the pluginIngestion Script Enhancement
--drop_collectionflag toscripts/langchain_web_ingest.pyfor easier re-ingestion with different embedding modelsBy Submitting this PR I confirm:
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.