Found that NVIDIA NIM DOES support structured output, just not through LangChain's standard .with_structured_output() method.
Source: NVIDIA NIM Structured Generation Documentation
For direct API calls, nvext must be at the ROOT level of the request body:
response = client.chat.completions.create(
model="nvidia/llama-3.1-nemotron-nano-8b-v1",
messages=messages,
extra_body={"nvext": {"guided_json": json_schema}}, # ← Use extra_body for OpenAI client
)Use model_kwargs with extra_body to pass NVIDIA-specific parameters:
llm = ChatOpenAI(
base_url=nim_url,
model="nvidia/llama-3.1-nemotron-nano-8b-v1",
model_kwargs={
"extra_body": {"nvext": {"guided_json": json_schema}} # ← MUST use extra_body!
}
)# Direct nvext fails - OpenAI client doesn't recognize it
model_kwargs={"nvext": {...}} # ❌ AsyncCompletions.create() got unexpected kwarg 'nvext'
# guided_json without nvext wrapper
extra_body={"guided_json": json_schema} # ❌ Must be nested in nvextTested on: NIM LLM API v1.8.4, vLLM backend, llama-3.1-nemotron-nano-8b-v1
Reference: NIM 1.12.0 Structured Generation
from pydantic import BaseModel
# Define Pydantic model
class PlanningDecision(BaseModel):
strategy: str
rationale: str
plan: str = ""
# Convert to JSON schema
json_schema = PlanningDecision.model_json_schema()
# Create LLM with model_kwargs - MUST use extra_body to wrap nvext!
llm = ChatOpenAI(
base_url=nim_url,
model="nvidia/llama-3.1-nemotron-nano-8b-v1",
model_kwargs={
"extra_body": {"nvext": {"guided_json": json_schema}} # ← Correct format
}
)
# Use normally
response = await llm.ainvoke(prompt)
# Response will be valid JSON matching schema!
# Parse back to Pydantic
result = PlanningDecision.model_validate_json(response.content)# Test both formats to see which your NIM version supports
# From backend pod or with port-forward- Start with planner_node (most critical)
- Test thoroughly
- Verify no errors
If successful:
- hackathon_agent.py: planner_node
- ttd_dr/components/planner.py
- ttd_dr/core.py (3 locations)
- ttd_dr/components/denoiser.py
- ttd_dr/components/evolver.py
✅ Type safety: Guaranteed valid JSON
✅ No parsing errors: NIM validates at generation time
✅ Better reliability: Eliminates NoneType bugs
✅ Pydantic integration: Easy model conversion
✅ Performance: xgrammar backend is fast
Manual parsing works fine:
- ✅ Proven in production
- ✅ Has error handling
- ✅ No blocking issues
⚠️ Just less elegant
This is a quality-of-life improvement, not critical.
- Test stable API:
extra_body={"guided_json": schema}→ ❌ Does NOT work - Test older API:
extra_body={"nvext": {"guided_json": schema}}→ ✅ WORKS with LangChain! - Test root-level:
model_kwargs={"nvext": {...}}→ ❌ OpenAI client rejects unknown kwargs - Determine which format your NIM version supports →
extra_bodywrapper required for LangChain - Implement in planner_node → Updated to use
model_kwargs={"extra_body": {"nvext": {...}}} - Roll out to TTD-DR components:
- core.py (3 locations)
- evaluator.py
- planner.py
- denoiser.py
- red_team.py
- context_pruner.py
- search.py
- evolver.py
- Update documentation with working approach
- Update test cases to reflect correct format
Why LangChain's .with_structured_output() failed:
- Uses OpenAI's
response_formatparameter - NVIDIA NIM doesn't support that parameter
- Needs NVIDIA-specific
guided_jsoninstead
Key insight:
- NVIDIA has their own structured output API
- It's more powerful (regex, grammar support too!)
- Just needs different integration approach
If guided_json works:
- Testing: 30 minutes
- Implementation: 1-2 hours
- High value improvement
If it doesn't work:
- Keep current manual parsing (works fine)
- Document as API limitation
- No changes needed
Status: ✅ COMPLETED (December 21, 2025)
All TTD-DR components updated with correct extra_body wrapper format. Deployed and verified working on EKS cluster with:
- NIM LLM API with TensorRT-LLM backend
nvidia/llama-3.1-nemotron-nano-8b-v1model- 16K context, 32 max batch size
Key learnings:
- LangChain's ChatOpenAI passes
model_kwargsto OpenAI client'screate()method - OpenAI client validates kwargs and rejects unknown ones like
nvext - Must use
extra_bodyto pass NVIDIA-specific parameters:model_kwargs={"extra_body": {"nvext": {"guided_json": schema}}} - The
extra_bodycontents are added to the HTTP request body, puttingnvextat root level as NVIDIA expects 🎯