Skip to content

Commit 6bf62e9

Browse files
author
StackMemory Bot (CLI)
committed
docs: add design principles architecture note
1 parent 10db093 commit 6bf62e9

2 files changed

Lines changed: 93 additions & 0 deletions

File tree

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
# Design Principles
2+
3+
## The Three-Layer Architecture
4+
5+
```
6+
┌─────────────────────────────────────────┐
7+
│ FAT SKILLS (intelligence) │
8+
│ Markdown procedures that encode │
9+
│ judgment, process, domain knowledge. │
10+
│ This is where 90% of the value lives. │
11+
├─────────────────────────────────────────┤
12+
│ THIN HARNESS (routing) │
13+
│ ~200 lines of code. JSON in, text out. │
14+
│ Read-only by default. State machine. │
15+
├─────────────────────────────────────────┤
16+
│ DETERMINISTIC FOUNDATION (execution) │
17+
│ QueryDB, ReadDoc, Search, Timeline │
18+
│ — the tools that never fail ambiguously│
19+
└─────────────────────────────────────────┘
20+
```
21+
22+
### The Principle
23+
24+
**Push intelligence UP into skills. Push execution DOWN into deterministic tooling. Keep the harness THIN.**
25+
26+
When you do this:
27+
- Every model improvement automatically improves every skill
28+
- The deterministic layer stays perfectly reliable
29+
- The harness never accumulates complexity
30+
31+
### How This Maps to StackMemory
32+
33+
| Layer | StackMemory Component | Examples |
34+
|-------|----------------------|----------|
35+
| **Fat Skills** | `.claude/skills/`, CLAUDE.md, wiki articles | Context engineering, code conventions, deploy recipes |
36+
| **Thin Harness** | MCP server, CLI, hooks, handoff script | `stackmemory restore`, `stackmemory snap`, frame lifecycle |
37+
| **Deterministic Foundation** | SQLite, file system, git, embeddings | `contexts` table, `.stackmemory/` directory, decision log files |
38+
39+
### Anti-Patterns
40+
41+
- **Fat harness**: Logic in the MCP server that should be a skill. If you're writing `if/else` chains in the harness, move it to a skill.
42+
- **Thin skills**: Skills that just call tools. If a skill has no judgment, it's a tool wrapper — push it down.
43+
- **Smart foundation**: Database queries that encode business logic. Keep the foundation dumb — SELECT/INSERT/UPDATE only.
44+
45+
## Cross-Agent Memory Strategies
46+
47+
When multiple agents need shared state, choose the mechanism that matches the bottleneck:
48+
49+
| Need | Strategy | StackMemory Component |
50+
|------|----------|----------------------|
51+
| Survive session restart | **Persistent context** | `stackmemory restore` / handoff script |
52+
| Share decisions across agents | **Decision log** | `.stackmemory/decisions/` files |
53+
| Transfer orchestrator state to worker | **Text handoff** (current) | `-smd` wrapper, structured notes |
54+
| Transfer latent state without text | **KV cache compaction** (research) | Not yet — requires runtime KV access |
55+
| Find relevant prior context | **Semantic search** | Embeddings + vector index |
56+
| Replicate exact prior state | **Snapshot** | `stackmemory snap save/restore` |
57+
58+
### Current Default: Text Handoff
59+
60+
The `-smd` wrapper (`stackmemory-auto-handoff.sh`) does text-level handoff:
61+
1. Saves current session state before exit
62+
2. Restores prior context on next session start
63+
3. Injects structured notes (decisions, corrections, task state)
64+
65+
This is the **"structured notes" strategy** — human-readable, auditable, portable across model families. It works with any API (Claude, Codex, local models).
66+
67+
### Future: Latent Briefing (Research)
68+
69+
For systems that control the inference runtime (self-hosted models, custom Cloudflare workers), **Latent Briefing** offers a more efficient path:
70+
71+
- Compact orchestrator KV cache using Attention Matching
72+
- Task-guided scoring retains only positions relevant to the current worker
73+
- Eliminates text serialization overhead
74+
75+
**Status**: Research reference. Blocked by API access — Claude API doesn't expose KV state. Viable for self-hosted models or custom inference runtimes.
76+
77+
**When to revisit**: When StackMemory supports self-hosted model backends, or when Substrate Cloud ships a custom inference runtime.
78+
79+
**Reference**: See skill doc `latent-briefing.skill.md` for the full technical treatment, decision framework, and gotchas.
80+
81+
## Compaction Hierarchy
82+
83+
When context is too large, apply these strategies in order:
84+
85+
1. **Observation masking** — Hide tool outputs that aren't relevant to the current task (cheapest)
86+
2. **Prefix caching** — Reuse identical prompt prefixes across calls (free with API support)
87+
3. **Structured notes** — Summarize prior sessions into decision/correction format (current default)
88+
4. **Semantic retrieval** — Pull only relevant chunks from prior context (needs embeddings)
89+
5. **KV cache compaction** — Transfer latent state directly (requires runtime access)
90+
91+
Each level is more powerful but harder to implement. Start from the top. Only move down when the level above is insufficient.

docs/architecture/TECHNICAL_ARCHITECTURE.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,8 @@ The outer system that:
4747

4848
> **Harness = runtime. Frames = call stack. Tools = syscalls. Digests = return values.**
4949
50+
**Design principle**: Push intelligence UP into skills. Push execution DOWN into deterministic tooling. Keep the harness THIN. See `DESIGN_PRINCIPLES.md` for the full three-layer architecture and cross-agent memory strategy hierarchy.
51+
5052
---
5153

5254
## Database Design

0 commit comments

Comments
 (0)