|
1 | | -# croissant.ai — Agent Guide |
2 | | - |
3 | | -Tool-agnostic reference for AI coding agents working in this repository. |
4 | | - |
5 | | -## Stack |
6 | | - |
7 | | -Node.js / Express / PostgreSQL / Redis |
8 | | -Railway deployment | Stripe / Salesforce / QuickBooks integrations |
| 1 | +# StackMemory - Project Configuration |
9 | 2 |
|
10 | 3 | ## Project Structure |
11 | 4 |
|
12 | 5 | ``` |
13 | 6 | src/ |
14 | | - api/ # Route handlers |
15 | | - core/ # monitoring-service, cache-service, queue-service, master-agent, api-validation |
16 | | - features/ # Feature modules |
17 | | - shared/ # Shared utilities |
18 | | - integrations/ # Third-party connectors |
19 | | -docs/ # Documentation |
20 | | -scripts/ # Automation scripts |
21 | | -docker/ # Container configs |
22 | | -prompts/ # Externalized LLM prompt templates |
| 7 | + cli/ # CLI commands and entry point |
| 8 | + core/ # Core business logic |
| 9 | + config/ # Config types and manager |
| 10 | + context/ # Frame management, enrichment, rehydration |
| 11 | + database/ # SQLite adapter, migrations, query cache |
| 12 | + digest/ # Digest generation (hybrid, chronological) |
| 13 | + errors/ # Error types and recovery |
| 14 | + merge/ # Stack merge and conflict resolution |
| 15 | + models/ # Model routing, complexity scoring |
| 16 | + monitoring/ # Logging, metrics, session monitor |
| 17 | + performance/ # Caching, profiling, benchmarks |
| 18 | + query/ # Query parsing and routing |
| 19 | + retrieval/ # Context retrieval, LLM provider |
| 20 | + session/ # Handoff, session management |
| 21 | + skills/ # Skill storage and types |
| 22 | + storage/ # Tiered storage, remote sync |
| 23 | + trace/ # Debug tracing, trace detection |
| 24 | + integrations/ # External integrations |
| 25 | + claude-code/ # Agent bridge, post-task hooks |
| 26 | + linear/ # Linear sync, webhooks, OAuth |
| 27 | + mcp/ # MCP server, 56 tool handlers |
| 28 | + ralph/ # Multi-agent swarm orchestration |
| 29 | + daemon/ # Unified daemon, session daemon |
| 30 | + features/ # Analytics, browser, sweep, TUI |
| 31 | + hooks/ # Claude Code hook handlers |
| 32 | + skills/ # Built-in skill implementations |
| 33 | + utils/ # Shared utilities |
| 34 | +scripts/ # Build and utility scripts |
| 35 | +docs/ # Documentation |
23 | 36 | ``` |
24 | 37 |
|
| 38 | +## Key Files |
| 39 | + |
| 40 | +- Entry: src/cli/index.ts |
| 41 | +- MCP Server: src/integrations/mcp/server.ts |
| 42 | +- Frame Manager: src/core/context/frame-manager.ts |
| 43 | +- Database: src/core/database/sqlite-adapter.ts |
| 44 | +- Snapshot: src/core/worktree/capture.ts |
| 45 | +- Preflight: src/core/worktree/preflight.ts |
| 46 | +- Conductor: src/cli/commands/orchestrator.ts (core) + orchestrate.ts (CLI) |
| 47 | +- Conductor Traces: src/cli/commands/conductor-traces.ts |
| 48 | +- Frame Enrichment: src/core/context/frame-enrichment.ts |
| 49 | +- Process Utils: src/utils/process-cleanup.ts |
| 50 | +- Shared Utils: src/core/utils/{git,text,fs}.ts |
| 51 | + |
| 52 | +## Detailed Guides |
| 53 | + |
| 54 | +Quick reference (agent_docs/): |
| 55 | +- linear_integration.md - Linear sync |
| 56 | +- mcp_server.md - MCP tools |
| 57 | +- database_storage.md - Storage |
| 58 | +- claude_hooks.md - Hooks |
| 59 | + |
| 60 | +Full documentation (docs/): |
| 61 | +- principles.md - Agent programming paradigm |
| 62 | +- architecture.md - Extension model and browser sandbox |
| 63 | +- SPEC.md - Technical specification |
| 64 | +- API_REFERENCE.md - API docs |
| 65 | +- DEVELOPMENT.md - Dev guide |
| 66 | +- SETUP.md - Installation |
| 67 | + |
25 | 68 | ## Commands |
26 | 69 |
|
27 | 70 | ```bash |
28 | | -npm run dev # Start dev server |
29 | | -npm run test # Run test suites (3 parallel Jest workers, maxWorkers=4) |
30 | | -npm run lint # Lint check |
31 | | -npm run migrate # Run DB migrations |
32 | | -docker-compose up -d # Start local DBs |
| 71 | +npm run build # Compile TypeScript (esbuild) |
| 72 | +npm run lint # ESLint check |
| 73 | +npm run lint:fix # Auto-fix lint issues |
| 74 | +npm run lint:fast # Fast lint via oxlint |
| 75 | +npm run typecheck # tsc --noEmit (8GB heap, avoids OOM) |
| 76 | +npm test # Run Vitest (watch) |
| 77 | +npm run test:run # Run tests once |
| 78 | +npm run linear:sync # Sync with Linear |
| 79 | + |
| 80 | +# StackMemory CLI |
| 81 | +stackmemory capture # Save session state for handoff |
| 82 | +stackmemory restore # Restore from captured state |
| 83 | +stackmemory snapshot save # Post-run context snapshot (alias: snap) |
| 84 | +stackmemory snapshot list # List recent snapshots |
| 85 | +stackmemory preflight # File overlap check for parallel tasks (alias: pf) |
| 86 | +stackmemory conductor start # Autonomous Linear→worktree→agent orchestrator |
| 87 | +stackmemory conductor learn # Analyze agent outcomes (success rate, failure phases, error patterns) |
| 88 | +stackmemory conductor learn --evolve # Auto-mutate prompt template from failure data (GEPA) |
| 89 | +stackmemory conductor status # Live agent status dashboard |
| 90 | +stackmemory conductor monitor # Real-time TUI with phase tracking |
| 91 | +stackmemory conductor finalize # Clean up dead/stale agents |
| 92 | +stackmemory conductor traces <issue-id> # View conversation traces for an agent run |
| 93 | +stackmemory conductor replay <session-id> # Replay full agent conversation from traces |
| 94 | +stackmemory conductor trace-stats # Aggregate trace statistics |
| 95 | +stackmemory loop "<cmd>" --until "<pattern>" # Poll until condition met (alias: watch) |
| 96 | +``` |
| 97 | + |
| 98 | +## Working Directory |
| 99 | + |
| 100 | +- PRIMARY: /Users/jwu/Dev/stackmemory |
| 101 | +- ALLOWED: All subdirectories |
| 102 | +- TEMP: /tmp for temporary operations |
| 103 | + |
| 104 | +## Validation |
| 105 | + |
| 106 | +Verify each step after code changes — pre-commit hooks catch 80% of CI failures locally: |
| 107 | +1. `npm run lint` - fix any errors AND warnings |
| 108 | +2. `npm run test:run` - verify no regressions |
| 109 | +3. `npm run build` - ensure compilation |
| 110 | +4. Run code to verify it works |
| 111 | + |
| 112 | +Test coverage: |
| 113 | +- New features require tests in `src/**/__tests__/` |
| 114 | +- Maintain or improve coverage (no untested code paths) |
| 115 | +- Critical paths: context management, handoff, Linear sync |
| 116 | + |
| 117 | +Testing rules: |
| 118 | +- Run `npm run test:run` via subagent or background task — never inline (blocks context) |
| 119 | +- ESLint: use `catch {}` not `catch (_err) {}` (lint rule) |
| 120 | +- `vi.clearAllMocks()` resets `mockReturnValue` — re-set mocks in `beforeEach` |
| 121 | +- Pre-commit hook runs: lint + parallel vitest + build — fix issues before commit, never skip |
| 122 | + |
| 123 | +## Git Rules |
| 124 | + |
| 125 | +The pre-commit hook enforces lint + test + build. Fix the underlying issue rather than bypassing it. |
| 126 | + |
| 127 | +- Do not use `--no-verify` on git push or commit — fix the hook failure instead |
| 128 | +- Fix lint/test errors before pushing |
| 129 | +- If pre-push hooks fail, fix the underlying issue |
| 130 | +- Run `npm run lint && npm run test:run` before pushing |
| 131 | +- Commit message format: `type(scope): message` |
| 132 | +- Branch naming: `feature/STA-XXX-description` | `fix/STA-XXX-description` | `chore/description` |
| 133 | + |
| 134 | +## Task Management |
| 135 | + |
| 136 | +- Use TodoWrite for 3+ steps or multiple requests |
| 137 | +- Keep one task in_progress at a time |
| 138 | +- Update task status immediately on completion |
| 139 | + |
| 140 | +## Security |
| 141 | + |
| 142 | +NEVER hardcode secrets - use process.env with dotenv/config |
| 143 | + |
| 144 | +```javascript |
| 145 | +import 'dotenv/config'; |
| 146 | +const API_KEY = process.env.LINEAR_API_KEY; |
| 147 | +if (!API_KEY) { |
| 148 | + console.error('LINEAR_API_KEY not set'); |
| 149 | + process.exit(1); |
| 150 | +} |
33 | 151 | ``` |
34 | 152 |
|
35 | | -## Git Conventions |
| 153 | +Environment sources (check in order): |
| 154 | +1. .env file |
| 155 | +2. .env.local |
| 156 | +3. ~/.zshrc |
| 157 | +4. Process environment |
| 158 | + |
| 159 | +Secret patterns to block: lin_api_* | lin_oauth_* | sk-* | npm_* |
| 160 | + |
| 161 | +## Deploy |
| 162 | + |
| 163 | +```bash |
| 164 | +# npm publish (uses NPM_TOKEN from .env, no OTP needed) |
| 165 | +git stash -- scripts/gepa/ # stash GEPA state (dirties working tree) |
| 166 | +NPM_TOKEN=$(grep '^NPM_TOKEN=' .env | cut -d= -f2) \ |
| 167 | + npm publish --registry https://registry.npmjs.org/ \ |
| 168 | + --//registry.npmjs.org/:_authToken="$NPM_TOKEN" |
| 169 | +git stash pop # restore GEPA state |
| 170 | + |
| 171 | +# Railway |
| 172 | +railway up |
| 173 | + |
| 174 | +# Pre-publish checks require clean git status — stash GEPA files first |
| 175 | +``` |
| 176 | + |
| 177 | +## Conductor (Autonomous Agent Orchestration) |
| 178 | + |
| 179 | +The conductor manages autonomous coding agents via Linear issues: |
| 180 | + |
| 181 | +**Data files** (all under `~/.stackmemory/conductor/`): |
| 182 | +- `prompt-template.md` — Agent prompt template with `{{VARIABLE}}` substitution (auto-created on first `conductor start`) |
| 183 | +- `outcomes.jsonl` — JSONL log of agent outcomes (success/failure, phase, tokens, errors) |
| 184 | +- `evolution-log.jsonl` — History of `--evolve` mutations applied to the prompt template |
| 185 | +- `agents/<issue-id>/status.json` — Per-agent status files |
| 186 | +- `agents/<issue-id>/output.log` — Agent stdout/stderr |
| 187 | +- `traces.db` — SQLite database with per-turn conversation traces (tool calls, tokens, phases, content previews) |
| 188 | + |
| 189 | +**Intelligence features**: |
| 190 | +- Multi-model routing with difficulty prediction (routes simple tasks to cheaper models) |
| 191 | +- Smart retry with exponential backoff and prior context injection |
| 192 | +- Auto-PR creation on successful agent completion |
| 193 | +- Trace-based evidence: per-turn conversation logging (tools, tokens, phases) to traces.db |
| 194 | + |
| 195 | +**Learning loop**: |
| 196 | +1. Agents run → outcomes logged to `outcomes.jsonl`, traces to `traces.db` |
| 197 | +2. `conductor learn` analyzes patterns (success rate, failure phases, error types) |
| 198 | +3. `conductor learn --evolve` calls Claude to mutate `prompt-template.md` based on failure data |
| 199 | +4. Next agent run uses the improved template → repeat |
| 200 | + |
| 201 | +**Template variables**: `{{ISSUE_ID}}`, `{{TITLE}}`, `{{DESCRIPTION}}`, `{{LABELS}}`, `{{PRIORITY}}`, `{{ATTEMPT}}`, `{{PRIOR_CONTEXT}}` |
| 202 | + |
| 203 | +## Task Delegation Model |
| 204 | + |
| 205 | +Route effort by task complexity — not all code changes deserve equal scrutiny: |
| 206 | + |
| 207 | +**AUTOMATE** — Execute immediately, lint+test is sufficient: |
| 208 | +- CRUD operations, boilerplate, formatting, simple transforms |
| 209 | +- Adding a tool handler following existing switch/case pattern |
| 210 | +- Config additions (new env var, feature flag) |
| 211 | + |
| 212 | +**STANDARD** — Normal workflow, lint+test+build: |
| 213 | +- Feature implementation, bug fixes, refactoring |
| 214 | +- New test coverage, documentation updates |
| 215 | +- Integration wiring (adding handler to server.ts dispatch) |
| 216 | + |
| 217 | +**CAREFUL** — Review approach before implementation: |
| 218 | +- API/schema changes, database migrations, auth flows |
| 219 | +- New integration patterns (MCP tools, webhook handlers) |
| 220 | +- Changes to frame-manager, sqlite-adapter, or daemon lifecycle |
| 221 | +- Anything touching error handling chains |
| 222 | + |
| 223 | +**ARCHITECT** — Plan mode required, explore existing patterns first: |
| 224 | +- New service boundaries, system integrations |
| 225 | +- Performance-critical paths (FTS5 queries, search scoring) |
| 226 | +- Breaking changes to MCP protocol or CLI interface |
| 227 | + |
| 228 | +**HUMAN** — Explicit user approval before any changes: |
| 229 | +- Security-critical decisions, secret handling |
| 230 | +- Irreversible operations (data migrations, schema drops) |
| 231 | +- Publishing (npm publish, Railway deploy) |
| 232 | + |
| 233 | +Quality gates scale with tier — don't over-engineer AUTOMATE tasks, don't under-review CAREFUL ones. |
| 234 | + |
| 235 | +For AUTOMATE and STANDARD tiers: make only the requested changes. Don't refactor surrounding code, add abstractions for one-time operations, or create helpers that are used once. Three similar lines of code is better than a premature abstraction. |
| 236 | + |
| 237 | +## Session Budget |
36 | 238 |
|
37 | | -- Branch prefixes: `feature/`, `fix/`, `chore/` |
38 | | -- Commit format: `type(scope): message` |
39 | | -- Do NOT add `Co-Authored-By` lines to commits |
40 | | -- Pre-commit hook runs: `npm run lint` + `npm run test` + E2E browser screenshots |
| 239 | +- Max 1 major topic per session — split unrelated work into separate sessions |
| 240 | +- Run /compact or summarize at ~50% context usage to avoid overflow |
| 241 | +- Plan-execute sessions (low interaction, high edits) are most efficient |
| 242 | +- Avoid exploratory marathons with topic-switching — burns 30-40% extra tokens |
41 | 243 |
|
42 | | -## Testing Rules |
| 244 | +## Context Maintenance |
43 | 245 |
|
44 | | -- **Framework**: Jest + SWC |
45 | | -- **DB mocking**: Use dependency injection (DI), not global mocks |
46 | | -- **Supertest**: Pass `app` (NOT `server`) to supertest |
47 | | -- **Global jest**: src/ tests use global `jest` — do NOT import from `@jest/globals` (causes redeclaration errors) |
48 | | -- **Mock reset**: `jest.clearAllMocks()` resets `mockReturnValue` — always re-set mocks in `beforeEach` |
49 | | -- **Test runner**: `npm test` is long-running; run in a background process or sub-agent, not inline |
| 246 | +**`/update-docs`** — Run weekly or when context feels stale: |
| 247 | +- Audits CLAUDE.md, MEMORY.md, agent_docs/ against git history and codebase |
| 248 | +- Detects stale entries, missing patterns, outdated paths |
| 249 | +- Trigger: start of week, after major refactors, or when sessions feel slow/confused |
50 | 250 |
|
51 | | -## ESLint Rules |
| 251 | +**`/recover`** — Run when a session goes off the rails: |
| 252 | +- Analyzes traces to find where context drifted from intent |
| 253 | +- Maps drift to specific doc fixes (missing guidance, stale memory, ambiguous instruction) |
| 254 | +- Trigger: user says "this is wrong", "not what I wanted", "off the rails", repeated corrections |
52 | 255 |
|
53 | | -- Use `catch {}` not `catch (_err) {}` — underscore prefix not in the allowed pattern |
54 | | -- CJS format for JS files in `src/` |
| 256 | +**`/next`** — Run at session start or when asking "what's next": |
| 257 | +- Scans git log, TODO files, Linear issues, and memory for actionable items |
| 258 | +- Prioritizes: unfinished work > flagged issues > queued tasks > continuations |
| 259 | +- Trigger: session start, "what's next", "whats next", between tasks |
55 | 260 |
|
56 | | -## Key Patterns |
| 261 | +**`/learn`** — Run at session end to capture learnings: |
| 262 | +- Reviews session work, then audits memory, CLAUDE.md, skills, scripts, and wiki |
| 263 | +- Proposes creates/updates/deletes with confirmation before applying |
| 264 | +- Trigger: end of session, after significant work, "what should I update" |
57 | 265 |
|
58 | | -- Provenance tracking: every data point includes source, timestamp, lineage |
59 | | -- Multi-tenant container isolation |
60 | | -- DI route factories for testability |
61 | | -- Error handling: return undefined over throwing; log and continue over crashing |
62 | | -- Add `.js` extension to relative ESM imports |
| 266 | +**When to use which:** |
| 267 | +- Starting a session or between tasks → `/next` (pick what to work on) |
| 268 | +- Session producing wrong results → `/recover` (diagnose + fix now) |
| 269 | +- Routine maintenance, nothing broken → `/update-docs` (proactive gardening) |
| 270 | +- After publishing a new version → `/update-docs` (catch version/path drift) |
| 271 | +- After conductor failures → `/recover last` (learn from agent traces) |
| 272 | +- End of session → `/learn` (capture what changed, update artifacts) |
63 | 273 |
|
64 | | -## StackMemory Context Rule |
| 274 | +## Workflow |
65 | 275 |
|
66 | | -- When an agent fetches conversation context for active work, it must pass the exact current assignment or question as `task_query`. |
67 | | -- Prefer the MCP shape: |
68 | | - - `org_id` |
69 | | - - `conversation_id` |
70 | | - - `task_query` |
71 | | - - `recover_on_low_signal: true` |
72 | | -- Do not fetch raw `get_conversation` context for worker execution unless full transcript behavior is explicitly required. |
| 276 | +- Check .env for API keys before asking |
| 277 | +- Run npm run linear:sync after task completion |
| 278 | +- Use browser MCP for visual testing |
| 279 | +- Review recent commits and stackmemory.json on session start |
| 280 | +- Use subagents for multi-step tasks |
| 281 | +- Ask 1-3 clarifying questions for complex commands (one at a time) |
0 commit comments