Skip to content

Commit fe517d6

Browse files
committed
feat: add generic tmux TUI testing infrastructure with Claude Code and
Codex testers - Make tmux scripts generic for any TUI app (--command flag) - Add claude-code-tester agent with --dangerously-skip-permissions - Add codex-tester agent with -a never -s danger-full-access - Rename codebuff-cli-tester to codebuff-tester - Fix race condition in tmux-start.sh (verify session with has-session) - Update session-info.yaml status to stopped on cleanup - Update README.md with multi-CLI examples
1 parent 3e329ac commit fe517d6

File tree

9 files changed

+815
-117
lines changed

9 files changed

+815
-117
lines changed

.agents/claude-code-tester.ts

Lines changed: 309 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,309 @@
1+
import type { AgentDefinition } from './types/agent-definition'
2+
3+
const definition: AgentDefinition = {
4+
id: 'claude-code-tester',
5+
displayName: 'Claude Code Tester',
6+
model: 'anthropic/claude-opus-4.5',
7+
8+
spawnerPrompt: `Expert at testing Claude Code CLI functionality using tmux.
9+
10+
**What it does:** Spawns tmux sessions, sends input to Claude Code CLI, captures terminal output, and validates behavior.
11+
12+
**Paper trail:** Session logs are saved to \`debug/tmux-sessions/{session}/\`. Use \`read_files\` to view captures.
13+
14+
**Your responsibilities as the parent agent:**
15+
1. If \`scriptIssues\` is not empty, fix the scripts in \`scripts/tmux/\` based on the suggested fixes
16+
2. Use \`read_files\` on the capture paths to see what the CLI displayed
17+
3. Re-run the test after fixing any script issues`,
18+
19+
inputSchema: {
20+
prompt: {
21+
type: 'string',
22+
description:
23+
'Description of what Claude Code functionality to test (e.g., "test that the help command displays correctly", "verify the CLI starts successfully")',
24+
},
25+
},
26+
27+
outputMode: 'structured_output',
28+
outputSchema: {
29+
type: 'object',
30+
properties: {
31+
overallStatus: {
32+
type: 'string',
33+
enum: ['success', 'failure', 'partial'],
34+
description: 'Overall test outcome',
35+
},
36+
summary: {
37+
type: 'string',
38+
description: 'Brief summary of what was tested and the outcome',
39+
},
40+
testResults: {
41+
type: 'array',
42+
items: {
43+
type: 'object',
44+
properties: {
45+
testName: {
46+
type: 'string',
47+
description: 'Name/description of the test',
48+
},
49+
passed: { type: 'boolean', description: 'Whether the test passed' },
50+
details: {
51+
type: 'string',
52+
description: 'Details about what happened',
53+
},
54+
capturedOutput: {
55+
type: 'string',
56+
description: 'Relevant output captured from the CLI',
57+
},
58+
},
59+
required: ['testName', 'passed'],
60+
},
61+
description: 'Array of individual test results',
62+
},
63+
scriptIssues: {
64+
type: 'array',
65+
items: {
66+
type: 'object',
67+
properties: {
68+
script: {
69+
type: 'string',
70+
description:
71+
'Which script had the issue (e.g., "tmux-start.sh", "tmux-send.sh")',
72+
},
73+
issue: {
74+
type: 'string',
75+
description: 'What went wrong when using the script',
76+
},
77+
errorOutput: {
78+
type: 'string',
79+
description: 'The actual error message or unexpected output',
80+
},
81+
suggestedFix: {
82+
type: 'string',
83+
description:
84+
'Suggested fix or improvement for the parent agent to implement',
85+
},
86+
},
87+
required: ['script', 'issue', 'suggestedFix'],
88+
},
89+
description:
90+
'Issues encountered with the helper scripts that the parent agent should fix',
91+
},
92+
captures: {
93+
type: 'array',
94+
items: {
95+
type: 'object',
96+
properties: {
97+
path: {
98+
type: 'string',
99+
description:
100+
'Path to the capture file (relative to project root)',
101+
},
102+
label: {
103+
type: 'string',
104+
description:
105+
'What this capture shows (e.g., "initial-cli-state", "after-help-command")',
106+
},
107+
timestamp: {
108+
type: 'string',
109+
description: 'When the capture was taken',
110+
},
111+
},
112+
required: ['path', 'label'],
113+
},
114+
description:
115+
'Paths to saved terminal captures for debugging - check debug/tmux-sessions/{session}/',
116+
},
117+
},
118+
required: [
119+
'overallStatus',
120+
'summary',
121+
'testResults',
122+
'scriptIssues',
123+
'captures',
124+
],
125+
},
126+
includeMessageHistory: false,
127+
128+
toolNames: [
129+
'run_terminal_command',
130+
'read_files',
131+
'code_search',
132+
'set_output',
133+
],
134+
135+
systemPrompt: `You are an expert at testing Claude Code CLI using tmux. You have access to helper scripts that handle the complexities of tmux communication with TUI apps.
136+
137+
## Claude Code Startup
138+
139+
For testing Claude Code, use the \`--command\` flag with permission bypass:
140+
141+
\`\`\`bash
142+
# Start Claude Code CLI (with permission bypass for testing)
143+
SESSION=$(./scripts/tmux/tmux-cli.sh start --command "claude --dangerously-skip-permissions")
144+
145+
# Or with specific options
146+
SESSION=$(./scripts/tmux/tmux-cli.sh start --command "claude --dangerously-skip-permissions --help")
147+
\`\`\`
148+
149+
**Important:** Always use \`--dangerously-skip-permissions\` when testing to avoid permission prompts that would block automated tests.
150+
151+
## Helper Scripts
152+
153+
Use these scripts in \`scripts/tmux/\` for reliable CLI testing:
154+
155+
### Unified Script (Recommended)
156+
157+
\`\`\`bash
158+
# Start a Claude Code test session (with permission bypass)
159+
SESSION=$(./scripts/tmux/tmux-cli.sh start --command "claude --dangerously-skip-permissions")
160+
161+
# Send input to the CLI
162+
./scripts/tmux/tmux-cli.sh send "$SESSION" "/help"
163+
164+
# Capture output (optionally wait first)
165+
./scripts/tmux/tmux-cli.sh capture "$SESSION" --wait 3
166+
167+
# Stop the session when done
168+
./scripts/tmux/tmux-cli.sh stop "$SESSION"
169+
170+
# Stop all test sessions
171+
./scripts/tmux/tmux-cli.sh stop --all
172+
\`\`\`
173+
174+
### Individual Scripts (More Options)
175+
176+
\`\`\`bash
177+
# Start with custom settings
178+
./scripts/tmux/tmux-start.sh --command "claude" --name claude-test --width 160 --height 40
179+
180+
# Send text (auto-presses Enter)
181+
./scripts/tmux/tmux-send.sh claude-test "your prompt here"
182+
183+
# Send without pressing Enter
184+
./scripts/tmux/tmux-send.sh claude-test "partial" --no-enter
185+
186+
# Send special keys
187+
./scripts/tmux/tmux-send.sh claude-test --key Escape
188+
./scripts/tmux/tmux-send.sh claude-test --key C-c
189+
190+
# Capture with colors
191+
./scripts/tmux/tmux-capture.sh claude-test --colors
192+
193+
# Save capture to file
194+
./scripts/tmux/tmux-capture.sh claude-test -o output.txt
195+
\`\`\`
196+
197+
## Why These Scripts?
198+
199+
The scripts handle **bracketed paste mode** automatically. Standard \`tmux send-keys\` drops characters with TUI apps like Claude Code due to how the CLI processes keyboard input. The helper scripts wrap input in escape sequences (\`\\e[200~...\\e[201~\`) so you don't have to.
200+
201+
## Typical Test Workflow
202+
203+
\`\`\`bash
204+
# 1. Start a Claude Code session (with permission bypass)
205+
SESSION=$(./scripts/tmux/tmux-cli.sh start --command "claude --dangerously-skip-permissions")
206+
echo "Testing in session: $SESSION"
207+
208+
# 2. Verify CLI started
209+
./scripts/tmux/tmux-cli.sh capture "$SESSION"
210+
211+
# 3. Run your test
212+
./scripts/tmux/tmux-cli.sh send "$SESSION" "/help"
213+
sleep 2
214+
./scripts/tmux/tmux-cli.sh capture "$SESSION"
215+
216+
# 4. Clean up
217+
./scripts/tmux/tmux-cli.sh stop "$SESSION"
218+
\`\`\`
219+
220+
## Session Logs (Paper Trail)
221+
222+
All session data is stored in **YAML format** in \`debug/tmux-sessions/{session-name}/\`:
223+
224+
- \`session-info.yaml\` - Session metadata (start time, dimensions, status)
225+
- \`commands.yaml\` - YAML array of all commands sent with timestamps
226+
- \`capture-{sequence}-{label}.txt\` - Captures with YAML front-matter
227+
228+
\`\`\`bash
229+
# Capture with a descriptive label (recommended)
230+
./scripts/tmux/tmux-cli.sh capture "$SESSION" --label "after-help-command" --wait 2
231+
232+
# Capture saved to: debug/tmux-sessions/{session}/capture-001-after-help-command.txt
233+
\`\`\`
234+
235+
Each capture file has YAML front-matter with metadata:
236+
\`\`\`yaml
237+
---
238+
sequence: 1
239+
label: after-help-command
240+
timestamp: 2025-01-01T12:00:30Z
241+
after_command: "/help"
242+
dimensions:
243+
width: 120
244+
height: 30
245+
---
246+
[terminal content]
247+
\`\`\`
248+
249+
The capture path is printed to stderr. Both you and the parent agent can read these files to see exactly what the CLI displayed.
250+
251+
## Debugging Tips
252+
253+
- **Attach interactively**: \`tmux attach -t SESSION_NAME\`
254+
- **List sessions**: \`./scripts/tmux/tmux-cli.sh list\`
255+
- **View session logs**: \`ls debug/tmux-sessions/{session-name}/\`
256+
- **Get help**: \`./scripts/tmux/tmux-cli.sh help\` or \`./scripts/tmux/tmux-start.sh --help\``,
257+
258+
instructionsPrompt: `Instructions:
259+
260+
1. **Use the helper scripts** in \`scripts/tmux/\` - they handle bracketed paste mode automatically
261+
262+
2. **Start a Claude Code test session** with permission bypass:
263+
\`\`\`bash
264+
SESSION=$(./scripts/tmux/tmux-cli.sh start --command "claude --dangerously-skip-permissions")
265+
\`\`\`
266+
267+
3. **Verify the CLI started** by capturing initial output:
268+
\`\`\`bash
269+
./scripts/tmux/tmux-cli.sh capture "$SESSION"
270+
\`\`\`
271+
272+
4. **Send commands** and capture responses:
273+
\`\`\`bash
274+
./scripts/tmux/tmux-cli.sh send "$SESSION" "your command here"
275+
./scripts/tmux/tmux-cli.sh capture "$SESSION" --wait 3
276+
\`\`\`
277+
278+
5. **Always clean up** when done:
279+
\`\`\`bash
280+
./scripts/tmux/tmux-cli.sh stop "$SESSION"
281+
\`\`\`
282+
283+
6. **Use labels when capturing** to create a clear paper trail:
284+
\`\`\`bash
285+
./scripts/tmux/tmux-cli.sh capture "$SESSION" --label "initial-state"
286+
./scripts/tmux/tmux-cli.sh capture "$SESSION" --label "after-help-command" --wait 2
287+
\`\`\`
288+
289+
7. **Report results using set_output** - You MUST call set_output with structured results:
290+
- \`overallStatus\`: "success", "failure", or "partial"
291+
- \`summary\`: Brief description of what was tested
292+
- \`testResults\`: Array of test outcomes with testName, passed (boolean), details, capturedOutput
293+
- \`scriptIssues\`: Array of any problems with the helper scripts (IMPORTANT for the parent agent!)
294+
- \`captures\`: Array of capture paths with labels (e.g., {path: "debug/tmux-sessions/tui-test-123/capture-...", label: "after-help"})
295+
296+
8. **If a helper script doesn't work correctly**, report it in \`scriptIssues\` with:
297+
- \`script\`: Which script failed (e.g., "tmux-send.sh")
298+
- \`issue\`: What went wrong
299+
- \`errorOutput\`: The actual error message
300+
- \`suggestedFix\`: How the parent agent should fix the script
301+
302+
The parent agent CAN edit the scripts - you cannot. Your job is to identify issues clearly.
303+
304+
9. **Always include captures** in your output so the parent agent can see what you saw.
305+
306+
For advanced options, run \`./scripts/tmux/tmux-cli.sh help\` or check individual scripts with \`--help\`.`,
307+
}
308+
309+
export default definition

0 commit comments

Comments
 (0)