Skip to content

Add cogames-watch-replay skill and headless frame capture script#7

Open
SolbiatiAlessandro wants to merge 5 commits intoMetta-AI:mainfrom
SolbiatiAlessandro:pr/cogames-watch-replay-skill
Open

Add cogames-watch-replay skill and headless frame capture script#7
SolbiatiAlessandro wants to merge 5 commits intoMetta-AI:mainfrom
SolbiatiAlessandro:pr/cogames-watch-replay-skill

Conversation

@SolbiatiAlessandro
Copy link
Copy Markdown

@SolbiatiAlessandro SolbiatiAlessandro commented Mar 29, 2026

What this adds

scripts/capture_frames.py — runs an episode headlessly and saves emoji grid snapshots to a text file at regular intervals. No GUI, no TTY, no interactive input. Works with any policy; defaults to StarterPolicy (no LLM required).

.claude/skills/cogames-watch-replay/SKILL.md — a Claude Code skill that invokes the script and guides structured analysis of the output.

Why

Watching what the policy is actually doing spatially is the highest-leverage debugging tool — are agents stuck, are they reaching gear stations, are they spreading across the map? The existing unicode renderer requires interactive keyboard input (SPACE to unpause), which makes it unusable by autonomous Claude agents or in CI.

This script hooks into Rollout.event_handlers directly, runs headlessly, and writes a plain text file that Claude (or a human) can read and parse.

Real example: StarterPolicy gets stuck after step 50

Running the script on the default machina_1 mission with 4 agents reveals an immediate problem in the starter policy:

A0: moved 4 cells step 0→50, then STUCK for all 250 remaining steps (row=47, col≈41)
A1: moved 2 cells step 0→50, then STUCK for all 250 remaining steps (row=48, col≈41)
A2: moved 2 cells step 0→50, then STUCK for all 250 remaining steps (row=48, col≈42)
A3: moved 14 cells step 0→50, then STUCK for all 250 remaining steps (row=55, col≈46)
Total reward: 0.0000 across all 300 steps

All 4 agents freeze near their spawn point around step 50 and never move again. This is exactly the kind of spatial insight the script is designed to surface — numbers alone (reward=0) don't tell you why, but the frame sequence makes it unambiguous.

How to use

# Default: StarterPolicy, 500 steps, snapshot every 50
python scripts/capture_frames.py

# Gear-up phase (watch first 200 steps closely)
python scripts/capture_frames.py --steps 200 --every 10

# Full episode
python scripts/capture_frames.py --steps 1000 --every 100

# Single agent to isolate behavior
python scripts/capture_frames.py --agents 1 --steps 500 --every 50

# Your own policy
python scripts/capture_frames.py --policy class=cogames.policy.my_policy.MyPolicy

# Output to a specific file
python scripts/capture_frames.py --out docs/replay_frames.txt

From Claude Code: /cogames-watch-replay --steps 500 --every 50

What the skill teaches Claude to do

The skill guides structured analysis beyond just reading the grid visually:

  1. Extract agent positions programmatically — search for 🟦🟧🟩🟨 symbols, record (row, col) per frame
  2. Compute movement deltas — Manhattan distance between frames; flag agents stuck for >30% of episode
  3. Track reward growth rate — deceleration signals hub depletion or navigation failure
  4. Zoom into stuck areas — extract 15×15 subgrid around frozen agent to identify blocker (wall, extractor, wrong-gear station)
  5. Compare configs — run 1/3/8-agent and compare per-agent reward to distinguish individual vs. contention problems

🤖 Generated with Claude Code

relh and others added 3 commits March 26, 2026 14:13
scripts/capture_frames.py: runs an episode headlessly and saves emoji
grid snapshots to a text file at regular intervals. Works with any
policy (defaults to StarterPolicy, no LLM required). Useful for
diagnosing navigation, gear acquisition, and routing without a GUI.

.claude/skills/cogames-watch-replay/SKILL.md: Claude skill that invokes
the script and guides analysis — extract agent coordinates
programmatically, detect stuck agents by movement delta, zoom into
blocked areas with a 15x15 subgrid, compare 1/3/8-agent configs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@nishu-builder nishu-builder force-pushed the main branch 5 times, most recently from 0454f65 to 46ca0e3 Compare April 3, 2026 01:26
@SolbiatiAlessandro
Copy link
Copy Markdown
Author

let's ship this? @daveey

@nishu-builder nishu-builder force-pushed the main branch 6 times, most recently from cbb95f7 to 5340aa9 Compare April 13, 2026 22:14
@nishu-builder nishu-builder force-pushed the main branch 6 times, most recently from 16e7904 to a1623f6 Compare April 16, 2026 20:58
@nishu-builder nishu-builder force-pushed the main branch 2 times, most recently from 26f2bef to 71ab990 Compare April 24, 2026 22:07
@relh relh force-pushed the main branch 2 times, most recently from ec17523 to 87261c8 Compare April 26, 2026 00:40
@desiorac
Copy link
Copy Markdown

The symbol legend (line 119) is hardcoded to 4 agents (🟦🟧🟩🟨), but --agents 8 is a documented usage. DEFAULT_SYMBOL_MAP likely assigns symbols for agents 4-7, but they won't appear in the file header - the stuck-agent heuristic in the skill then silently misidentifies unlabeled cells.

Worth making the legend dynamic:

agent_syms = " ".join(
    f"{self._map_buffer._symbol_map.get(f'agent{i}', '?')}=agent{i}"
    for i in range(self._sim.num_agents)
)
f.write(f"# Symbols: {agent_syms}  ⬛=wall  · =empty\n\n")

Side note: flipping mettagrid from workspace to git-pinned in pyproject.toml breaks local dev for contributors who have it checked out as a sibling workspace. Intentional for standalone distribution?

@desiorac
Copy link
Copy Markdown

Depends on whether --agents 8 is a real use case today or just documented future scope. If nobody is actually running 8 agents right now, ship it - the bug only surfaces when you exceed 4 agents and the legend mismatch causes the stuck-agent heuristic to flag false positives. If 8-agent runs are happening in CI, the two-liner fix is worth doing first: replace the hardcoded legend with a loop over self._sim.num_agents before merging.

@relh relh force-pushed the main branch 5 times, most recently from 20578ae to 80a3706 Compare May 5, 2026 21:34
@nishu-builder nishu-builder force-pushed the main branch 2 times, most recently from 750885c to bd5f6be Compare May 7, 2026 22:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants