Skip to content

POC: Integrate local document retrieval with skills via MCP#2002

Draft
oliverholworthy wants to merge 5 commits intoNVIDIA:mainfrom
oliverholworthy:oholworthy/local-search-cli-skill
Draft

POC: Integrate local document retrieval with skills via MCP#2002
oliverholworthy wants to merge 5 commits intoNVIDIA:mainfrom
oliverholworthy:oholworthy/local-search-cli-skill

Conversation

@oliverholworthy
Copy link
Copy Markdown
Contributor

This PR adds a POC for document retrieval through a skill and MCP tool. The goal of this POC is to explore integration patterns for local document retreival:

  • a repo-local skill can automatically route local documentation questions to a retrieval tool
  • an MCP server provides a clean agent/tool boundary
  • indexes are local, reusable, and scoped to the resolved input path
  • the agent receives structured evidence and synthesizes the final answer from that evidence, instead of manually grepping the repository

What This Adds

  • retriever local CLI commands for local document indexing/search:
    • init
    • search
    • ask
    • status
    • doctor
    • clean
  • MCP server entrypoint:
    • retriever-local-mcp
  • MCP tools:
    • local_document_ask
    • local_document_search
    • local_document_status
  • Repo-scoped Codex skill:
    • .agents/skills/nemo-retriever-local-document-search
  • Project-local Codex MCP config example:
    • .codex/config.toml.example

Supported document types are currently:

.pdf
.txt
.md
.markdown
.docx
.pptx

The workflow is retrieval-only. ask returns evidence and metadata, but does not generate a prose answer itself:

"answer": null,
"answer_generation": "not_configured"

The agent is expected to synthesize the final response from returned evidence.

Configuration

After installing/building the local NeMo Retriever environment, configure Codex with a project-local .codex/config.toml like:

[mcp_servers.nemo_retriever_local]
command = "/absolute/path/to/NeMo-Retriever/nemo_retriever/.venv/bin/retriever-local-mcp"
args = []
startup_timeout_sec = 60
tool_timeout_sec = 3600
enabled_tools = ["local_document_ask", "local_document_search", "local_document_status"]

cwd is intentionally omitted so the MCP server inherits the active Codex project/session directory. This lets prompts like In ./docs, ... resolve relative to whichever project Codex is currently running in.

For another project, copy the skill directory into that repo:

.agents/skills/nemo-retriever-local-document-search/

Then start Codex from that project root and ask a docs-grounded question such as:

In ./docs, explain how to configure this project for self-hosted model endpoints, async execution, validators, and optional MCP tool use. Cite the docs you use.

Behavior

By default the tool uses local embedding inference with:

nvidia/llama-nemotron-embed-1b-v2

Remote embedding is available explicitly with --inference remote / inference="remote" and an API key, but local is the default for the skill.

When the MCP tool is called without an explicit index, it derives a stable project-local index path from a hash of the resolved absolute input path, for example:

.nemo-retriever/local-index-54ed29c6fcb8

This avoids collisions between ./docs in different repos and allows warm reuse across follow-up questions.

What The POC Demonstrates

Tested this with the NeMo Retriever docs and the DataDesigner docs. The DataDesigner test showed the agent using the MCP retrieval tool first, creating/reusing a path-scoped local index, and answering a multi-part configuration question from retrieved docs without broad manual repo search.

This is the core outcome: the skill + MCP pattern working as as a portable way to wire local retrieval into agent behavior.

Known Gaps

  • Evidence is currently chunk/file based; Markdown/text line spans would improve citation quality.
  • PDF support is text-focused; this does not currently use full multimodal extraction.
  • The index is refreshed from a manifest/staleness check, not a live watcher.
  • This is not intended as a shared production search service.
  • The implementation is larger than ideal because the library does not yet expose a single “index this local corpus into a VDB and search it” abstraction.
  • Discovery, manifest/staleness, and ingest-to-LanceDB lifecycle code should be moved into smaller reusable modules or library APIs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant