|
| 1 | +# Design Principles |
| 2 | + |
| 3 | +## The Three-Layer Architecture |
| 4 | + |
| 5 | +``` |
| 6 | +┌─────────────────────────────────────────┐ |
| 7 | +│ FAT SKILLS (intelligence) │ |
| 8 | +│ Markdown procedures that encode │ |
| 9 | +│ judgment, process, domain knowledge. │ |
| 10 | +│ This is where 90% of the value lives. │ |
| 11 | +├─────────────────────────────────────────┤ |
| 12 | +│ THIN HARNESS (routing) │ |
| 13 | +│ ~200 lines of code. JSON in, text out. │ |
| 14 | +│ Read-only by default. State machine. │ |
| 15 | +├─────────────────────────────────────────┤ |
| 16 | +│ DETERMINISTIC FOUNDATION (execution) │ |
| 17 | +│ QueryDB, ReadDoc, Search, Timeline │ |
| 18 | +│ — the tools that never fail ambiguously│ |
| 19 | +└─────────────────────────────────────────┘ |
| 20 | +``` |
| 21 | + |
| 22 | +### The Principle |
| 23 | + |
| 24 | +**Push intelligence UP into skills. Push execution DOWN into deterministic tooling. Keep the harness THIN.** |
| 25 | + |
| 26 | +When you do this: |
| 27 | +- Every model improvement automatically improves every skill |
| 28 | +- The deterministic layer stays perfectly reliable |
| 29 | +- The harness never accumulates complexity |
| 30 | + |
| 31 | +### How This Maps to StackMemory |
| 32 | + |
| 33 | +| Layer | StackMemory Component | Examples | |
| 34 | +|-------|----------------------|----------| |
| 35 | +| **Fat Skills** | `.claude/skills/`, CLAUDE.md, wiki articles | Context engineering, code conventions, deploy recipes | |
| 36 | +| **Thin Harness** | MCP server, CLI, hooks, handoff script | `stackmemory restore`, `stackmemory snap`, frame lifecycle | |
| 37 | +| **Deterministic Foundation** | SQLite, file system, git, embeddings | `contexts` table, `.stackmemory/` directory, decision log files | |
| 38 | + |
| 39 | +### Anti-Patterns |
| 40 | + |
| 41 | +- **Fat harness**: Logic in the MCP server that should be a skill. If you're writing `if/else` chains in the harness, move it to a skill. |
| 42 | +- **Thin skills**: Skills that just call tools. If a skill has no judgment, it's a tool wrapper — push it down. |
| 43 | +- **Smart foundation**: Database queries that encode business logic. Keep the foundation dumb — SELECT/INSERT/UPDATE only. |
| 44 | + |
| 45 | +## Cross-Agent Memory Strategies |
| 46 | + |
| 47 | +When multiple agents need shared state, choose the mechanism that matches the bottleneck: |
| 48 | + |
| 49 | +| Need | Strategy | StackMemory Component | |
| 50 | +|------|----------|----------------------| |
| 51 | +| Survive session restart | **Persistent context** | `stackmemory restore` / handoff script | |
| 52 | +| Share decisions across agents | **Decision log** | `.stackmemory/decisions/` files | |
| 53 | +| Transfer orchestrator state to worker | **Text handoff** (current) | `-smd` wrapper, structured notes | |
| 54 | +| Transfer latent state without text | **KV cache compaction** (research) | Not yet — requires runtime KV access | |
| 55 | +| Find relevant prior context | **Semantic search** | Embeddings + vector index | |
| 56 | +| Replicate exact prior state | **Snapshot** | `stackmemory snap save/restore` | |
| 57 | + |
| 58 | +### Current Default: Text Handoff |
| 59 | + |
| 60 | +The `-smd` wrapper (`stackmemory-auto-handoff.sh`) does text-level handoff: |
| 61 | +1. Saves current session state before exit |
| 62 | +2. Restores prior context on next session start |
| 63 | +3. Injects structured notes (decisions, corrections, task state) |
| 64 | + |
| 65 | +This is the **"structured notes" strategy** — human-readable, auditable, portable across model families. It works with any API (Claude, Codex, local models). |
| 66 | + |
| 67 | +### Future: Latent Briefing (Research) |
| 68 | + |
| 69 | +For systems that control the inference runtime (self-hosted models, custom Cloudflare workers), **Latent Briefing** offers a more efficient path: |
| 70 | + |
| 71 | +- Compact orchestrator KV cache using Attention Matching |
| 72 | +- Task-guided scoring retains only positions relevant to the current worker |
| 73 | +- Eliminates text serialization overhead |
| 74 | + |
| 75 | +**Status**: Research reference. Blocked by API access — Claude API doesn't expose KV state. Viable for self-hosted models or custom inference runtimes. |
| 76 | + |
| 77 | +**When to revisit**: When StackMemory supports self-hosted model backends, or when Substrate Cloud ships a custom inference runtime. |
| 78 | + |
| 79 | +**Reference**: See skill doc `latent-briefing.skill.md` for the full technical treatment, decision framework, and gotchas. |
| 80 | + |
| 81 | +## Compaction Hierarchy |
| 82 | + |
| 83 | +When context is too large, apply these strategies in order: |
| 84 | + |
| 85 | +1. **Observation masking** — Hide tool outputs that aren't relevant to the current task (cheapest) |
| 86 | +2. **Prefix caching** — Reuse identical prompt prefixes across calls (free with API support) |
| 87 | +3. **Structured notes** — Summarize prior sessions into decision/correction format (current default) |
| 88 | +4. **Semantic retrieval** — Pull only relevant chunks from prior context (needs embeddings) |
| 89 | +5. **KV cache compaction** — Transfer latent state directly (requires runtime access) |
| 90 | + |
| 91 | +Each level is more powerful but harder to implement. Start from the top. Only move down when the level above is insufficient. |
0 commit comments