Skip to content

Yat-mo/agentforge-protocol

Repository files navigation

AgentForge Protocol

A coding protocol for agents that need proof, not vibes.

For Hermes, OpenClaw, Claude Code, Codex CLI, and any agent that can read files, edit code, run tests, delegate work, and check its own claims.

GitHub repo License Agents

English · 简体中文 · 繁體中文

AgentForge Protocol hero

Coding agents are quick. That is the fun part, and also the dangerous part. AgentForge Protocol keeps the speed, but forces the work to leave evidence behind.


Why this exists

Agents can write a patch before they understand the repo. They can produce a plan that sounds tidy but proves nothing. They can spin up subagents, accept their reports, and quietly ship a mess.

Anyone who has used these tools for real work has seen some version of this.

AgentForge Protocol is a small operating routine for avoiding that failure mode. It tells the agent when to stay lightweight, when to slow down, when to write tests first, when to debug instead of guessing, and when to bring in subagents without letting them drive the car.

The point is simple: every meaningful step should leave evidence behind.

This version also borrows the useful parts of architecture-driven governance: read the baseline first, frame the impact before editing, separate the fix lane from the retirement lane, and keep checkpoints for long work so the task does not drift.


At a glance

If the task is... The protocol does this
tiny and obvious inspect, patch, run the cheapest useful check, stop
a clear behavior change write the failing test first, then make it pass
vague or architectural inspect first, grill the open decisions, save a plan
multi-step split tasks, use focused subagents, review in stages
a bug or test failure reproduce, trace root cause, add a regression test
uncertain spike it before it becomes production architecture

Plan, test, build, review, verify loop


What it combines

agentforge-protocol sits on top of a few smaller skills and decides which one should lead.

Skill Job
karpathy-guidelines small diffs, fewer assumptions, less cleverness
grill-plan decisions before code when the task is vague or risky
writing-plans clear requirements turned into executable steps
test-driven-development behavior changes with a failing test before the fix
systematic-debugging root cause before patches
subagent-driven-development split work without losing control of the result
requesting-code-review final gate before commit, push, or ship
spike disposable experiments when guessing is worse than building

It uses Hermes' own layers instead of inventing extra paperwork:

  • current progress goes to the todo tool
  • non-trivial plans go to .hermes/plans/
  • stable user or environment facts go to memory
  • repeatable procedures and traps become skills
  • project-local tasks/lessons.md is used only when the repo already works that way

Install

git clone https://github.com/Yat-mo/agentforge-protocol.git
mkdir -p ~/.hermes/skills/software-development
cp -R agentforge-protocol/skills/software-development/agentforge-protocol \
  ~/.hermes/skills/software-development/

Start a fresh Hermes session so the skill loader picks it up:

hermes --skills agentforge-protocol

Or load it inside Hermes:

/skill agentforge-protocol

Quick start

Use agentforge-protocol. Add email validation to the signup flow.
Use agentforge-protocol. Design and implement workspace-level permissions.
Use agentforge-protocol. The export job passes locally but fails in CI with a timezone assertion.
Use agentforge-protocol. Spike whether we can stream partial PDF extraction results to the UI.

The router

The first job is to classify the work. A typo does not need a ceremony. A migration does.

Tiny obvious edit

Keep it light.

inspect → minimal patch → cheap verification → stop

No forced plan. No subagents. No theatre.

Clear behavior change

Use TDD unless there is a real reason not to.

read existing pattern
→ define baseline / hypothesis / success / failure / evidence plan
→ track fix lane + retirement lane when needed
→ write failing test
→ run RED
→ implement minimal code
→ run GREEN
→ run relevant regression
Ambiguous or architectural work

Use grill-plan before touching production code.

inspect code/docs/tests/logs
→ ask only what cannot be inspected
→ resolve decisions one by one
→ save `.hermes/plans/...md`
→ review plan
→ implement from the plan
Multi-task implementation

Use subagents, but keep the main agent responsible.

read saved plan once
→ extract tasks
→ implementer subagent per task
→ spec compliance review
→ code quality review
→ integration review
→ final verification
→ pre-commit gate
Bug or test failure

Debug first. Patch second.

read full error
→ reproduce
→ inspect recent changes
→ trace data flow
→ form one hypothesis
→ write regression test
→ fix root cause
→ verify
Feasibility unknown

Spike it. Do not turn uncertainty into production architecture.

decompose feasibility questions
→ test highest risk first
→ build disposable prototype
→ record VALIDATED / PARTIAL / INVALIDATED
→ only then plan production work

Pre-coding expectations

For non-trivial production code, write down the expectations before coding:

## Pre-coding expectations

### Baseline read set
The source of truth, architecture boundaries, owners, impact surface, compatibility constraints, and verification entry points to inspect before editing.

### Hypothesis
What I believe is true about the system and why this change should work.

### Success criteria
The checks that would make me comfortable saying this is done.

### Failure signals
Independent signs that the approach is wrong or unsafe.
These cannot just be "the success criteria did not pass".

### Ablations and expected observations
What I expect to see if a meaningful assumption or approach changes.

### Evidence plan
The fresh evidence that will support the final claim: tests, commands, logs, API responses, screenshots, or diff review results.

### Minimal verification path
The cheapest test, command, API call, UI action, or log check that proves the change.

This is the part that keeps the agent honest. Not fancy, just useful.


Plan shape

Non-trivial plans live under .hermes/plans/ and use action → verification steps.

# <Task> Implementation Plan

## Goal

## Non-goals

## Context discovered from code/docs/logs

## Pre-coding expectations

### Baseline read set
### Hypothesis
### Success criteria
### Failure signals
### Ablations and expected observations
### Evidence plan
### Minimal verification path

## Confirmed decisions

## Rejected alternatives

## Fix lane and retirement lane

## Checkpoint, resume hint, and drift check

## Implementation steps

1. <Action> -> verify: <check>
2. <Action> -> verify: <check>
3. <Action> -> verify: <check>

## Files likely to change

## Tests and validation

## Risks and rollback

## Review notes

A plan that says "make it work" is not a plan. Each step needs a way to prove itself.


Subagent rules

Subagents are useful. They are also very good at sounding confident.

Main agent and focused subagent roles

Use them like this:

Rule Why it matters
one subagent gets one focused task broad prompts create vague work
include exact paths, commands, constraints, and expected output fresh context needs real context
implementer subagents do not commit the main agent owns the final state
spec review happens before code quality review first ask if we built the right thing
the main agent verifies side effects self-reports are not proof

Good subagent roles: repository scout, implementation worker, spec compliance reviewer, code quality reviewer, debugging investigator, integration reviewer.


Completion gate

Before the agent says "done", check the boring stuff:

  • tests or smoke checks actually ran
  • if tests could not run, the reason is clear
  • the diff is small and tied to the request
  • there is no unrelated refactor or formatting drift
  • the change did not leave behind orphan imports, files, configs, or TODOs
  • fresh evidence is named, not implied
  • logs, API responses, UI behavior, or test output support the claim
  • bug fixes, refactors, and contract changes resolve both fix lane and retirement lane, or state remaining risk
  • long or high-risk tasks have checkpoint, resume hint, and drift check
  • risky changes got independent review
  • reusable lessons were saved in the right place

Before commit, push, ship, or PR:

targeted tests
→ broader tests where reasonable
→ git diff / git status
→ secret and local-data scan
→ independent review when risk is meaningful
→ commit only after verification passes

Skill layout

skills/
└── software-development/
    └── agentforge-protocol/
        └── SKILL.md

The repo is intentionally small. It ships one workflow skill, not a framework.


Philosophy

Good agentic coding is not about making the model slower.

It is about making the model harder to fool.

Harder to fool with vague requirements. Harder to fool with tests that pass but prove nothing. Harder to fool with a plausible subagent report. Harder to fool with a patch that hides the symptom. Harder to fool with a big diff that feels productive.

Small when small is enough. Systematic when the work can hurt you.


License

MIT

About

A verification-first coding protocol for Hermes, OpenClaw, Claude Code, Codex CLI, and autonomous coding agents.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors