The agk trace command provides comprehensive observability into your AI workflows, helping you understand execution flow, debug issues, and analyze performance.
- Overview
- Quick Start
- Capturing Traces
- Viewing Traces
- Trace Commands
- Trace Levels
- Understanding Spans
- Debugging Workflows
- Best Practices
Traces capture the complete execution history of your workflows, including:
- ⏱️ Timing: Duration of each step and operation
- 🔗 Flow: Parent-child relationships between operations
- 📝 Content: Prompts sent to LLMs and their responses
- 🛠️ Tools: Function calls and their results
- ❌ Errors: Detailed error information and stack traces
- 📊 Metadata: Context, configuration, and custom attributes
┌──────────────────┐
│ Your Workflow │
│ (with tracing) │
└────────┬─────────┘
│
▼
┌──────────────────┐
│ Trace Collector │
│ (OpenTelemetry) │
└────────┬─────────┘
│
▼
┌──────────────────┐
│ Trace Storage │
│ (.agk/runs/) │
└────────┬─────────┘
│
▼
┌──────────────────┐
│ AGK Trace CLI │
│ (Analysis) │
└──────────────────┘
# Enable tracing with detailed level
export AGK_TRACE=true
export AGK_TRACE_LEVEL=detailed
# Run your workflow
go run main.go# List all traces
agk trace list
# Show specific trace summary
agk trace show run-20260207-123456-12345678
# Interactive viewer (TUI)
agk trace view
# Generate flowchart
agk trace mermaid run-20260207-123456-12345678 > flow.md| Variable | Values | Description |
|---|---|---|
AGK_TRACE |
true, false |
Enable/disable tracing |
AGK_TRACE_LEVEL |
minimal, standard, detailed |
Data granularity |
AGK_TRACE_EXPORTER |
file, stdout |
Output destination |
AGK_TRACE_DIR |
path | Trace storage directory (default: .agk/runs) |
Data Captured:
- Start/end timestamps
- Duration
- Success/failure status
- High-level step names
Use Case:
- Production monitoring
- Performance metrics
- Minimal overhead
Example:
export AGK_TRACE=true
export AGK_TRACE_LEVEL=minimal
go run main.goOutput:
Span: workflow_execution
Duration: 45.2s
Status: OK
Span: research_step
Duration: 20.1s
Status: OK
Data Captured:
- Everything in Minimal
- Token counts
- Model names
- Latency metrics
- Error messages
Use Case:
- Development debugging
- Performance analysis
- Cost tracking
Example:
export AGK_TRACE=true
export AGK_TRACE_LEVEL=standard # or omit (default)
go run main.goOutput:
Span: llm_call
Duration: 2.3s
Model: llama3.2
Tokens: 450 input, 1200 output
Status: OK
Data Captured:
- Everything in Standard
- Complete prompts (system + user)
- Full LLM responses
- Tool call arguments
- Tool call results
- Memory state changes
Use Case:
- Deep debugging
- Prompt engineering
- Quality evaluation
- Audit trails
Example:
export AGK_TRACE=true
export AGK_TRACE_LEVEL=detailed
go run main.goOutput:
Span: llm_call
Duration: 2.3s
Model: llama3.2
Prompt:
System: You are a helpful research assistant...
User: Research artificial intelligence trends
Response:
Artificial intelligence is rapidly evolving...
[Full response text]
Tokens: 450 input, 1200 output
Show all captured traces:
agk trace listOutput:
Available Traces:
─────────────────────────────────────────────────
run-20260207-150034-71394771 | 2026-02-07 15:00:34 | 183.75s | ✓ Success
run-20260207-144512-82934521 | 2026-02-07 14:45:12 | 92.34s | ✗ Failed
run-20260207-143022-19283746 | 2026-02-07 14:30:22 | 156.21s | ✓ Success
Display high-level summary of a specific trace:
agk trace show run-20260207-150034-71394771Output:
Trace: run-20260207-150034-71394771
─────────────────────────────────────────────────
Status: Success
Duration: 183.75s
Started: 2026-02-07 15:00:34
Workflow: story
Execution Flow:
├─ workflow_start (0ms)
├─ research_step (65.2s)
│ ├─ llm_call (2.3s)
│ └─ llm_call (1.8s)
├─ summarize_step (58.1s)
│ └─ llm_call (3.1s)
└─ format_step (60.4s)
└─ llm_call (2.9s)
Total LLM Calls: 4
Total Tokens: 3,245 input, 8,912 output
Launch an interactive terminal UI for exploring traces:
agk trace viewFeatures:
- 📋 Browse all traces
- 🔍 Drill down into spans
- 📝 View full prompts and responses (press
d) - ⌨️ Keyboard navigation
- 🎨 Syntax highlighting
Keyboard Shortcuts:
| Key | Action |
|---|---|
↑/↓ |
Navigate spans |
→ |
Expand span |
← |
Collapse span |
d |
Show detailed view (prompts/responses) |
q |
Quit |
/ |
Search |
f |
Filter by status |
Create a Mermaid flowchart visualization:
agk trace mermaid run-20260207-150034-71394771 > flow.mdOutput (flow.md):
```mermaid
graph TD
A[Workflow Start] --> B[Research Step]
B --> C[LLM Call 1]
B --> D[LLM Call 2]
C --> E[Summarize Step]
D --> E
E --> F[LLM Call 3]
F --> G[Format Step]
G --> H[LLM Call 4]
H --> I[Workflow Complete]
style A fill:#90EE90
style I fill:#90EE90
style B fill:#87CEEB
style E fill:#87CEEB
style G fill:#87CEEB
```View in:
- GitHub (renders automatically)
- VS Code (Mermaid preview extension)
- Mermaid Live Editor
List all captured traces.
Usage:
agk trace list
agk trace list --limit 20
agk trace list --failed # Show only failed tracesOptions:
| Flag | Description | Default |
|---|---|---|
--limit |
Max traces to show | 50 |
--failed |
Show only failed traces | false |
--success |
Show only successful traces | false |
Display summary of a specific trace.
Usage:
agk trace show run-20260207-150034-71394771
agk trace show run-20260207-150034-71394771 --jsonOptions:
| Flag | Description |
|---|---|
--json |
Output as JSON |
--spans |
Show all spans (not just summary) |
Launch interactive trace viewer.
Usage:
agk trace view
agk trace view run-20260207-150034-71394771 # Jump to specific traceGenerate Mermaid flowchart.
Usage:
agk trace mermaid run-20260207-150034-71394771
agk trace mermaid run-20260207-150034-71394771 > flow.mdOptions:
| Flag | Description |
|---|---|
--style |
Diagram style: graph, sequence |
--depth |
Max depth to visualize |
Spans represent individual operations in a trace. Each span has:
{
"span_id": "abc123",
"trace_id": "run-20260207-150034-71394771",
"parent_id": "xyz789",
"name": "llm_call",
"start_time": "2026-02-07T15:00:34.123Z",
"end_time": "2026-02-07T15:00:36.456Z",
"duration_ms": 2333,
"status": "OK",
"attributes": {
"model": "llama3.2",
"provider": "ollama",
"temperature": 0.7
},
"events": [
{
"name": "prompt_sent",
"timestamp": "2026-02-07T15:00:34.124Z",
"attributes": {
"prompt": "You are a helpful assistant..."
}
},
{
"name": "response_received",
"timestamp": "2026-02-07T15:00:36.455Z",
"attributes": {
"response": "Here is the information..."
}
}
]
}| Span Name | Description | Key Attributes |
|---|---|---|
workflow_execution |
Top-level workflow | workflow_name |
agent_step |
Individual agent step | step_name, agent_name |
llm_call |
LLM API call | model, provider, tokens |
tool_call |
Function/tool execution | tool_name, arguments |
memory_operation |
Memory read/write | operation, key |
stream_chunk |
Streaming token | chunk_type, content |
workflow_execution (root)
├─ agent_step: research
│ ├─ llm_call
│ │ ├─ prompt_sent (event)
│ │ └─ response_received (event)
│ └─ tool_call: search
│ ├─ tool_start (event)
│ └─ tool_complete (event)
├─ agent_step: summarize
│ └─ llm_call
└─ agent_step: format
└─ llm_call
Symptom: Workflow takes too long to complete
Debug Steps:
-
Enable standard tracing:
export AGK_TRACE=true export AGK_TRACE_LEVEL=standard go run main.go
-
View trace summary:
agk trace show <trace-id>
-
Identify bottleneck:
├─ research_step (65.2s) ← Slow! ├─ summarize_step (2.1s) └─ format_step (1.8s) -
Drill into slow step:
agk trace view <trace-id> # Press 'd' on research_step to see details
-
Optimize:
- Reduce LLM
max_tokens - Use faster model
- Parallelize operations
- Cache results
- Reduce LLM
Symptom: Workflow produces incorrect or unexpected results
Debug Steps:
-
Enable detailed tracing:
export AGK_TRACE=true export AGK_TRACE_LEVEL=detailed go run main.go
-
View prompts and responses:
agk trace view <trace-id> # Press 'd' on llm_call spans
-
Check prompts:
- Is the system prompt correct?
- Is context being passed properly?
- Are variables interpolated correctly?
-
Analyze responses:
- Is the LLM understanding the task?
- Are instructions clear?
- Is output format correct?
-
Fix issues:
- Refine prompts
- Add examples
- Adjust temperature
- Change model
Symptom: Workflow crashes or returns errors
Debug Steps:
-
List failed traces:
agk trace list --failed
-
Show error details:
agk trace show <failed-trace-id>
-
Check error spans:
└─ llm_call (FAILED) Error: connection timeout after 30s -
View full trace:
agk trace view <failed-trace-id> # Navigate to failed span, press 'd'
-
Common issues:
- Network timeouts → Increase timeout
- Rate limits → Add retry logic
- Invalid prompts → Validate input
- Model errors → Check model availability
Symptom: High costs or slow responses
Debug Steps:
-
Enable standard tracing:
export AGK_TRACE=true export AGK_TRACE_LEVEL=standard go run main.go
-
View token summary:
agk trace show <trace-id>
Total Tokens: 3,245 input, 8,912 output -
Identify high-token operations:
agk trace view <trace-id> # Sort by tokens
-
Optimize:
- Reduce
max_tokens - Shorten prompts
- Use cheaper models for simple tasks
- Cache responses
- Reduce
# Use detailed tracing during development
export AGK_TRACE=true
export AGK_TRACE_LEVEL=detailed
export AGK_TRACE_EXPORTER=file# Standard level for tests
export AGK_TRACE=true
export AGK_TRACE_LEVEL=standard
export AGK_TRACE_DIR=.agk/test-traces# Minimal level for production
export AGK_TRACE=true
export AGK_TRACE_LEVEL=minimal
export AGK_TRACE_EXPORTER=file
# Or disable tracing entirely
export AGK_TRACE=false# .github/workflows/test.yml
- name: Run Tests with Tracing
env:
AGK_TRACE: true
AGK_TRACE_LEVEL: standard
run: go test ./...
- name: Archive Traces
uses: actions/upload-artifact@v3
with:
name: traces
path: .agk/runs/# Clean old traces (keep last 30 days)
find .agk/runs -type d -mtime +30 -exec rm -rf {} \;
# Archive important traces
tar -czf traces-$(date +%Y%m%d).tar.gz .agk/runs/| Level | Overhead | Use Case |
|---|---|---|
| Minimal | ~1-2% | Production |
| Standard | ~2-5% | Development |
| Detailed | ~5-10% | Debugging |
Tip: Disable tracing in latency-critical production environments or use minimal level.
Traces integrate seamlessly with the eval framework:
# semantic-tests.yaml
evalserver:
url: "http://localhost:8787"
workflow_name: "story"
# After running tests
agk eval semantic-tests.yamlTest report includes trace links:
**Trace ID:** [run-20260207-150034-71394771](.agk/runs/run-20260207-150034-71394771/)View test execution trace:
agk trace show run-20260207-150034-71394771Problem: AGK_TRACE=true but no traces in .agk/runs/
Solutions:
-
Check environment variable:
echo $AGK_TRACE
-
Verify trace directory exists:
ls -la .agk/runs/
-
Check file permissions:
chmod -R 755 .agk/
-
Try stdout exporter:
export AGK_TRACE_EXPORTER=stdout
Problem: Trace files consuming too much disk space
Solutions:
-
Lower trace level:
export AGK_TRACE_LEVEL=standard # or minimal
-
Clean old traces:
find .agk/runs -mtime +7 -delete
-
Compress traces:
tar -czf traces.tar.gz .agk/runs/ rm -rf .agk/runs/*
Problem: Prompts contain API keys or secrets
Solutions:
- Use environment variables (not hardcoded secrets)
- Filter sensitive data before tracing
- Use minimal trace level in production
- Secure trace storage with proper permissions:
chmod 700 .agk/runs/
- Eval Documentation - Automated testing
- AGK CLI Reference - Full command reference
- OpenTelemetry - Tracing standard