future-agi · nik13 · May 7, 2026 · May 7, 2026 · May 7, 2026 · May 7, 2026
diff --git a/src/lib/navigation.ts b/src/lib/navigation.ts
@@ -824,7 +824,7 @@ export const tabNavigation: NavTab[] = [
             items: [
               { title: 'End-to-End with Falcon AI: Trace → Debug → Evaluate → Dataset → Fix in One Workflow', href: '/docs/cookbook/falcon-ai/end-to-end' },
               { title: 'Context-Aware Trace Debugging with Falcon AI', href: '/docs/cookbook/falcon-ai/context-aware-debugging' },
-              { title: 'Building Evaluation Datasets from Production Traces with Falcon AI', href: '/docs/cookbook/falcon-ai/eval-datasets-from-traces' },
+              { title: 'Building Golden Datasets from Production Traces with Falcon AI', href: '/docs/cookbook/falcon-ai/eval-datasets-from-traces' },
             ]
           },
           {

diff --git a/src/pages/docs/cookbook/falcon-ai/context-aware-debugging.mdx b/src/pages/docs/cookbook/falcon-ai/context-aware-debugging.mdx
@@ -3,10 +3,6 @@ title: "Context-Aware Trace Debugging with Falcon AI"
 description: "Falcon AI auto-attaches the failing trace you're viewing, so you can debug it conversationally and get a paste-ready prompt fix without copy-pasting trace IDs."
 ---
 
-<TLDR>
-Open Falcon AI on a failing trace and run three turns: ask what went wrong, drill in with `/analyze-trace-errors`, and get a paste-ready prompt diff from `/fix-with-falcon`. You walk away with a verified prompt fix in minutes, without ever copy-pasting a trace ID or switching tabs.
-</TLDR>
-
 <div style={{display: "flex", gap: "8px", flexWrap: "wrap", margin: "0.5rem 0 1rem"}}>
 <a href="https://colab.research.google.com/github/future-agi/cookbooks/blob/cookbook/falcon-ai-page/falcon-ai/context-aware-debugging.ipynb" target="_blank" style={{display: "inline-flex"}}><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab" style={{height: "28px"}} /></a>
 <a href="https://github.com/future-agi/cookbooks/blob/cookbook/falcon-ai-page/falcon-ai/context-aware-debugging.ipynb" target="_blank" style={{display: "inline-flex"}}><img src="https://img.shields.io/badge/View_on_GitHub-181717?logo=github&logoColor=white" alt="GitHub" style={{height: "28px"}} /></a>
@@ -16,6 +12,8 @@ Open Falcon AI on a failing trace and run three turns: ask what went wrong, dril
 |------|-----------|---------|
 | 10 min | Beginner | `fi-instrumentation-otel` |
 
+By the end of this cookbook you will have a verified prompt fix for one failing trace, generated in three Falcon AI turns without ever copy-pasting a trace ID or switching tabs.
+
 <Prerequisites>
 - FutureAGI account → [app.futureagi.com](https://app.futureagi.com)
 - API keys: `FI_API_KEY` and `FI_SECRET_KEY` (see [Get your API keys](/docs/admin-settings))
@@ -36,12 +34,18 @@ export FI_SECRET_KEY="your-fi-secret-key"
 export OPENAI_API_KEY="your-openai-key"
 ```
 
-## Tutorial
+## What is Falcon AI?
+
+Falcon AI is the AI assistant built into the FutureAGI dashboard. Open it from the sidebar and it picks up whatever page you're viewing as context, so questions are answered against the trace, project, or dataset you're already on.
+
+It runs **skills**: slash commands that execute a structured workflow over the current context and produce a clickable artifact (a dataset, an eval run, a prompt diff). The four steps below add tracing to your agent, then drive a three-turn debugging chat that ends in a paste-ready prompt fix.
 
 <Steps>
 <Step title="Add tracing to your agent">
 
-Three lines send every LLM call and tool invocation to FutureAGI as structured spans. `OpenAIInstrumentor` auto-instruments the OpenAI SDK; wrap your agent's entry point with `@tracer.agent` so each request becomes one parent span.
+Falcon AI does its work by reading your agent's **traces**: a trace is the structured record of one request, broken into **spans** for each LLM call, tool invocation, or sub-step inside it. The agent has to be sending traces to FutureAGI before any of the next steps can run.
+
+Three lines below set that up. `OpenAIInstrumentor` patches the OpenAI SDK so every API call is captured automatically. The `@tracer.agent` decorator on your agent's entry point makes each request appear as one parent span with the OpenAI calls nested underneath.
 
 ```python
 from fi_instrumentation import register, FITracer
@@ -89,57 +93,72 @@ trace_provider.force_flush()
 For broader instrumentation patterns (custom spans, metadata tagging, prompt template tracking), see [Manual Tracing](/docs/cookbook/quickstart/manual-tracing).
 
 </Step>
-<Step title="Turn 1: open Falcon AI on the trace and ask what went wrong">
+<Step title="Ask Falcon AI what went wrong">
+
+Falcon AI picks up whatever page you're viewing as **context**. Open it on a trace detail page and the trace ID auto-attaches as a context chip in the chat input, so every question and skill in this conversation answers against that specific trace.
 
-In **Tracing**, click into the failing trace so the trace detail page is the active view. Open the Falcon AI sidebar; it opens with a context chip showing the current trace ID, so every question you ask is answered against that specific trace. Type:
+In **Tracing**, click into the failing trace so the trace detail page is the active view. Open the Falcon AI sidebar and type:
 
 > What went wrong with this trace?
 
 <Tip>
 `Cmd+K` (Mac) or `Ctrl+K` (Windows) opens Falcon AI from anywhere in the dashboard, with the current page auto-attached as a context chip.
 </Tip>
 
-Falcon AI reads the trace and gives an exploratory diagnosis: empty tool result, fallback to parametric memory, hallucinated paper descriptions.
+This first turn is exploratory: Falcon AI reads the trace and gives a diagnosis in plain English (the model fell back to parametric memory and invented paper descriptions instead of grounding its answer in real sources).
 
 <img src="https://fi-cookbook-assets.s3.ap-south-1.amazonaws.com/falcon-ai/context-aware-debugging/turn-1-open-question.png" alt="Falcon AI sidebar opened on the failing trace, with the trace context chip in the chat input and an exploratory diagnosis of the empty search result" style={{width: "100%", borderRadius: "0.75rem", border: "1px solid var(--color-border-default)"}} />
 
 </Step>
-<Step title="Turn 2: drill in with /analyze-trace-errors">
+<Step title="Drill into the failure mode">
+
+Same conversation. The skill `/analyze-trace-errors` classifies issues against an error taxonomy (Hallucinated Content, Tool Misuse, Wrong Intent, etc.), assigns a severity to each finding, and produces a quality scorecard for the trace.
 
 > /analyze-trace-errors
 
-Two findings, both High impact: a tool dispatch issue and Hallucinated Content. Plus a quality scorecard and three recommended fixes.
+Falcon AI returns Hallucinated Content as a High impact finding (the model invented papers from training data instead of grounding the answer in retrieved sources), plus a quality scorecard and recommended fixes.
 
 <img src="https://fi-cookbook-assets.s3.ap-south-1.amazonaws.com/falcon-ai/context-aware-debugging/turn-2-analyze-trace-errors.png" alt="Falcon AI showing the structured /analyze-trace-errors output with category findings, severity, and a quality scorecard for the same trace" style={{width: "100%", borderRadius: "0.75rem", border: "1px solid var(--color-border-default)"}} />
 
 This is diagnosis with suggestions. The next turn turns the suggestion into a paste-ready diff.
 
 </Step>
-<Step title="Turn 3: get the fix with /fix-with-falcon">
+<Step title="Generate the prompt fix">
+
+The third and final turn invokes `/fix-with-falcon`, which reads the system prompt and model output from the trace's LLM span and returns a copy-pasteable prompt edit in a *Current* / *Replace with* format. The Current block is pulled directly from the span so the diff is grounded in what the agent actually saw, not guessed from a description.
 
 > /fix-with-falcon
 
-Falcon AI returns a Current vs Replace with diff: keep the original prompt, append an empty-results instruction.
+Falcon AI returns the diff: keep the original system prompt, append a refusal instruction so the agent declines to answer rather than invent citations when it has no grounded source.
 
 <img src="https://fi-cookbook-assets.s3.ap-south-1.amazonaws.com/falcon-ai/context-aware-debugging/turn-3-fix-with-falcon.png" alt="Falcon AI fix-with-falcon output for the same trace showing What happened, Root cause in the agent, and a Current vs Replace with prompt diff" style={{width: "100%", borderRadius: "0.75rem", border: "1px solid var(--color-border-default)"}} />
 
-The Current block is pulled directly from the LLM span, not guessed. Paste the Replace with block as your new system prompt, re-run the same query, and open the new trace: empty tool result followed by the refusal, no fabricated content.
+Paste the **Replace with** block as your new system prompt, re-run the same query, and open the new trace: a clean refusal instead of a confidently invented citation list.
 
 </Step>
 </Steps>
 
+## What you solved
+
+The research assistant no longer invents papers when it lacks grounded sources. Re-run the same failing query after the fix and the trace shows a clean refusal, not a confidently invented citation list.
+
 <Check>
 You went from a failing trace to a verified prompt fix in three Falcon AI turns. No trace IDs copied, no spans expanded by hand.
 </Check>
 
+- **Hallucinated citations** (made-up paper titles invented from training data): caught by `/analyze-trace-errors`, fixed by `/fix-with-falcon` with a refusal instruction
+- **Trace ID copy-paste workflow**: replaced by Falcon AI's auto-attached trace context chip
+- **Ad-hoc diagnosis**: replaced by the structured findings + quality scorecard from `/analyze-trace-errors`
+- **Prompt fixes by guesswork**: replaced by `/fix-with-falcon`'s *Current* / *Replace with* diff pulled from the actual LLM span
+
 ## Explore further
 
 <CardGroup cols={3}>
   <Card title="End-to-End with Falcon AI" icon="sparkles" href="/docs/cookbook/falcon-ai/end-to-end">
     The full lifecycle: trace, debug, evaluate, dataset, fix in one workflow
   </Card>
-  <Card title="Building Evaluation Datasets from Production Traces" icon="database" href="/docs/cookbook/falcon-ai/eval-datasets-from-traces">
-    Once you've fixed one trace, lock the failure pattern in as a regression set
+  <Card title="Building Golden Datasets from Production Traces" icon="database" href="/docs/cookbook/falcon-ai/eval-datasets-from-traces">
+    Once you've fixed one trace, lock the failure pattern in as a regression dataset
   </Card>
   <Card title="Error Feed" icon="bug" href="/docs/error-feed">
     Per-trace quality scoring and error-category drilldown