NVIDIA Guided JSON - Implementation Guide

Discovery (Nov 29, 2025)

Found that NVIDIA NIM DOES support structured output, just not through LangChain's standard .with_structured_output() method.

Source: NVIDIA NIM Structured Generation Documentation

The NVIDIA Way

✅ VERIFIED WORKING FORMAT (Dec 21, 2025):

For direct API calls, nvext must be at the ROOT level of the request body:

response = client.chat.completions.create(
    model="nvidia/llama-3.1-nemotron-nano-8b-v1",
    messages=messages,
    extra_body={"nvext": {"guided_json": json_schema}},  # ← Use extra_body for OpenAI client
)

For LangChain ChatOpenAI:

Use model_kwargs with extra_body to pass NVIDIA-specific parameters:

llm = ChatOpenAI(
    base_url=nim_url,
    model="nvidia/llama-3.1-nemotron-nano-8b-v1",
    model_kwargs={
        "extra_body": {"nvext": {"guided_json": json_schema}}  # ← MUST use extra_body!
    }
)

❌ FORMATS THAT DON'T WORK:

# Direct nvext fails - OpenAI client doesn't recognize it
model_kwargs={"nvext": {...}}  # ❌ AsyncCompletions.create() got unexpected kwarg 'nvext'

# guided_json without nvext wrapper
extra_body={"guided_json": json_schema}  # ❌ Must be nested in nvext

Tested on: NIM LLM API v1.8.4, vLLM backend, llama-3.1-nemotron-nano-8b-v1

Reference: NIM 1.12.0 Structured Generation

Implementation Approach

For LangChain ChatOpenAI:

from pydantic import BaseModel

# Define Pydantic model
class PlanningDecision(BaseModel):
    strategy: str
    rationale: str
    plan: str = ""

# Convert to JSON schema
json_schema = PlanningDecision.model_json_schema()

# Create LLM with model_kwargs - MUST use extra_body to wrap nvext!
llm = ChatOpenAI(
    base_url=nim_url,
    model="nvidia/llama-3.1-nemotron-nano-8b-v1",
    model_kwargs={
        "extra_body": {"nvext": {"guided_json": json_schema}}  # ← Correct format
    }
)

# Use normally
response = await llm.ainvoke(prompt)
# Response will be valid JSON matching schema!

# Parse back to Pydantic
result = PlanningDecision.model_validate_json(response.content)

Testing Plan

Step 1: Local Test (needs cluster access)

# Test both formats to see which your NIM version supports
# From backend pod or with port-forward

Step 2: Implement in One Location

Start with planner_node (most critical)
Test thoroughly
Verify no errors

Step 3: Roll Out to All Locations

If successful:

hackathon_agent.py: planner_node
ttd_dr/components/planner.py
ttd_dr/core.py (3 locations)
ttd_dr/components/denoiser.py
ttd_dr/components/evolver.py

Benefits

✅ Type safety: Guaranteed valid JSON
✅ No parsing errors: NIM validates at generation time
✅ Better reliability: Eliminates NoneType bugs
✅ Pydantic integration: Easy model conversion
✅ Performance: xgrammar backend is fast

Current Workaround

Manual parsing works fine:

✅ Proven in production
✅ Has error handling
✅ No blocking issues
⚠️ Just less elegant

This is a quality-of-life improvement, not critical.

Implementation Checklist

Test stable API: extra_body={"guided_json": schema} → ❌ Does NOT work
Test older API: extra_body={"nvext": {"guided_json": schema}} → ✅ WORKS with LangChain!
Test root-level: model_kwargs={"nvext": {...}} → ❌ OpenAI client rejects unknown kwargs
Determine which format your NIM version supports → extra_body wrapper required for LangChain
Implement in planner_node → Updated to use model_kwargs={"extra_body": {"nvext": {...}}}
Roll out to TTD-DR components:
- core.py (3 locations)
- evaluator.py
- planner.py
- denoiser.py
- red_team.py
- context_pruner.py
- search.py
- evolver.py
Update documentation with working approach
Update test cases to reflect correct format

Notes

Why LangChain's .with_structured_output() failed:

Uses OpenAI's response_format parameter
NVIDIA NIM doesn't support that parameter
Needs NVIDIA-specific guided_json instead

Key insight:

NVIDIA has their own structured output API
It's more powerful (regex, grammar support too!)
Just needs different integration approach

Estimated Effort

If guided_json works:

Testing: 30 minutes
Implementation: 1-2 hours
High value improvement

If it doesn't work:

Keep current manual parsing (works fine)
Document as API limitation
No changes needed

Status: ✅ COMPLETED (December 21, 2025)

All TTD-DR components updated with correct extra_body wrapper format. Deployed and verified working on EKS cluster with:

NIM LLM API with TensorRT-LLM backend
nvidia/llama-3.1-nemotron-nano-8b-v1 model
16K context, 32 max batch size

Key learnings:

LangChain's ChatOpenAI passes model_kwargs to OpenAI client's create() method
OpenAI client validates kwargs and rejects unknown ones like nvext
Must use extra_body to pass NVIDIA-specific parameters: model_kwargs={"extra_body": {"nvext": {"guided_json": schema}}}
The extra_body contents are added to the HTTP request body, putting nvext at root level as NVIDIA expects 🎯

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NVIDIA Guided JSON - Implementation Guide

Discovery (Nov 29, 2025)

The NVIDIA Way

✅ VERIFIED WORKING FORMAT (Dec 21, 2025):

For LangChain ChatOpenAI:

❌ FORMATS THAT DON'T WORK:

Implementation Approach

For LangChain ChatOpenAI:

Testing Plan

Step 1: Local Test (needs cluster access)

Step 2: Implement in One Location

Step 3: Roll Out to All Locations

Benefits

Current Workaround

Implementation Checklist

Notes

Estimated Effort

FilesExpand file tree

NVIDIA_GUIDED_JSON_TODO.md

Latest commit

History

NVIDIA_GUIDED_JSON_TODO.md

File metadata and controls

NVIDIA Guided JSON - Implementation Guide

Discovery (Nov 29, 2025)

The NVIDIA Way

✅ VERIFIED WORKING FORMAT (Dec 21, 2025):

For LangChain ChatOpenAI:

❌ FORMATS THAT DON'T WORK:

Implementation Approach

For LangChain ChatOpenAI:

Testing Plan

Step 1: Local Test (needs cluster access)

Step 2: Implement in One Location

Step 3: Roll Out to All Locations

Benefits

Current Workaround

Implementation Checklist

Notes

Estimated Effort