feat(permanent): chunking for values > 600 bytes (book/page envelope)#8
Conversation
`fledge-plugin-memory` previously rejected any permanent save whose encrypted envelope exceeded ~882 bytes (the @corvidlabs/ts-algochat hard cap), and any tx note > 1024 bytes (Algorand's per-tx note limit). For larger content the user got "Permanent value too large for tx note" or "EncryptionError: Message too large". This change adds transparent multi-tx chunking on the permanent tier: - New `src/chunking.ts` — splits values into ≤600-byte UTF-8-safe chunks. UTF-8 multi-byte codepoints are never cut mid-character; the splitter walks back to the prior leading byte when a cut lands on a continuation byte (`0b10xxxxxx`). - `permanentSave` now emits N transactions for an N-chunk value. Each carries the same key + ISO-8601 `created` timestamp plus envelope fields `book` (= key today), `page` (1..N), `total` (N). Single- chunk saves stay on the legacy envelope shape so existing readers and indexers see no change. - `permanentRecall` / `permanentList` group by (key, created) inside a new `reassemble` step, require all `total` pages to be present (a save with missing pages is silently dropped, not partially reassembled), and join in page order. Tombstones cover all pages for a key without needing per-page tombstones because the latest-round-wins rule still picks the tombstone over the older multi-page write. - `permanentDelete` is unchanged — a single tombstone tx covers any number of chunks under that key. 19 unit tests: - `test/chunking.test.ts` — boundary cases (exactly N bytes, N+1 bytes), round-trip preservation including emoji + CJK + RTL, needsChunking heuristic. - `test/permanent-reassemble.test.ts` — legacy single-chunk pass- through, multi-page join, missing-page drop, two-save dedup, contiguous-page enforcement, mixed single+multi handling. Mutable (ARC-69 ASA) tier chunking is a follow-up — single PR is enough surface to review at once. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request implements a chunking system for Algorand transaction notes, which are limited to 1024 bytes. It introduces logic to split strings into UTF-8-safe segments and reassemble them, along with comprehensive unit tests for various character types and boundary conditions. A potential infinite loop was identified in the chunkValue function if the maximum chunk size is configured to be smaller than a single multi-byte character, and a fix was suggested to ensure the loop always progresses.
| while (offset < bytes.length) { | ||
| let end = Math.min(offset + MAX_CLEARTEXT_PER_CHUNK, bytes.length); | ||
| // Walk back if we landed in the middle of a UTF-8 continuation byte | ||
| // (0b10xxxxxx, i.e. (byte & 0xC0) === 0x80). We stop walking once | ||
| // we hit a leading byte; this caps regression at 3 bytes. | ||
| while (end < bytes.length && (bytes[end] & 0xc0) === 0x80) end--; | ||
| chunks.push(bytes.slice(offset, end).toString("utf-8")); | ||
| offset = end; | ||
| } |
There was a problem hiding this comment.
The chunkValue function is susceptible to an infinite loop if MAX_CLEARTEXT_PER_CHUNK is set to a value smaller than the byte length of a single UTF-8 character (e.g., if it were reduced to 2 for testing or future changes). If end walks back all the way to offset, the offset will never advance. While 600 bytes is plenty for any valid UTF-8 character (max 4 bytes), adding a guard ensures robustness against configuration changes or malformed input.
while (offset < bytes.length) {
let end = Math.min(offset + MAX_CLEARTEXT_PER_CHUNK, bytes.length);
// Walk back if we landed in the middle of a UTF-8 continuation byte
// (0b10xxxxxx, i.e. (byte & 0xC0) === 0x80). We stop walking once
// we hit a leading byte; this caps regression at 3 bytes.
while (end > offset && end < bytes.length && (bytes[end] & 0xc0) === 0x80) end--;
// If the chunk size is too small to fit even one full character,
// force advance to avoid an infinite loop.
if (end === offset) end = Math.min(offset + MAX_CLEARTEXT_PER_CHUNK, bytes.length);
chunks.push(bytes.slice(offset, end).toString("utf-8"));
offset = end;
}) PR #8 set MAX_CLEARTEXT_PER_CHUNK = 600 based on a too-loose estimate of envelope overhead (~150 bytes). The actual envelope is: {"type":"permanent-memory","key":"K","value":"<base64>","user":"<58>","created":"<24>","book":"K","page":N,"total":M} Breaking it down: - JSON syntax + field names: ~100 bytes - key (variable): up to 256 chars per validateKey - value (base64-encrypted): plaintext * 4/3 + 40 (nonce+MAC) bytes - user (Algorand address): 58 chars - created (ISO-8601): 24 chars - book — duplicates key: counted twice - page + total integers: up to 12 chars For a 30-char key, the value blob has 770 base64 chars of room (577 binary, 537 plaintext). For a 100-char key it shrinks to 432 plaintext. The empirical failure: re-importing the 393 dropped corvid-agent memories at 600 plaintext per chunk produced envelopes of 1235 bytes — over Algorand's 1024-byte note cap. `permanentSave`'s post-chunking assertion correctly fired: Internal: permanent envelope exceeded 1024 bytes (1235) after chunking. Raise MAX_CLEARTEXT_PER_CHUNK headroom (That assertion was added in #8 precisely so this kind of regression becomes loud instead of silent.) ## What this changes - `MAX_CLEARTEXT_PER_CHUNK`: 600 → **400**. Safe for keys up to ~120 chars; longer keys (rare; observed max in corvid-agent's 1,000+ keyspace is ~60) may still trip the assertion but won't silently corrupt. - Module docstring rewritten with the explicit byte budget so the next reader can re-derive a sound value when the envelope shape changes. - Loosened the "3000 bytes → 5 chunks" test to count chunks via `Math.ceil(total / MAX)` so it tracks the constant. ## New regression tests `envelope-fits invariant` (3 tests): simulate the on-chain envelope size for 30 / 60 / 100 char keys with a chunk at `MAX_CLEARTEXT_PER_CHUNK` and assert each lands under 1024 bytes. The 100-char test would have caught this bug at MAX=600 — and did catch the intermediate MAX=480 proposal during this fix's own iteration. ## Verified end-to-end against live localnet Hot-patched the installed plugin and re-tried two real failures: corvid-agent-build-queue-2025 (1133 B) → 4GQY4G3VNQCM... ✓ corvid-agent-council-2026-02-04 (2072 B) → JEWX62ESYD3T... ✓ Both wrote successfully as multi-chunk permanent ARC-69 / tx-note records. 29/29 unit tests pass. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
fledge-plugin-memorypreviously rejected any permanent save whose encrypted envelope exceeded ~882 bytes (the@corvidlabs/ts-algochathard cap), or whose tx note exceeded 1024 bytes (Algorand's per-tx limit). Real-world team-knowledge memories regularly hit these limits — a bulk import of 1,034 corvid-agent memories saw ~37% drop on this basis.This adds transparent multi-tx chunking on the permanent tier. Mutable (ARC-69 ASA) chunking is intentionally deferred — single PR keeps the review small.
How it works
src/chunking.tssplits values into ≤600-byte UTF-8-safe chunks. Multi-byte codepoints (emoji, CJK, RTL) are never cut mid-character — the splitter walks back to the prior leading byte on a continuation cut.permanentSaveemits N transactions for an N-chunk value. Each carries the same key + ISO-8601createdtimestamp plus envelope fieldsbook(= key today),page(1..N),total(N). Single-chunk saves stay on the legacy envelope shape so existing readers and indexers see no change.permanentRecall/permanentListgroup by(key, created)inside a newreassemblestep, require alltotalpages to be present (saves with missing pages are silently dropped, never partially reconstructed), and join in page order.permanentDelete) cover all pages for a key without needing per-page tombstones — the latest-round-wins rule already picks the tombstone over the older multi-page write.Envelope shape (single chunk — unchanged)
{ "type": "permanent-memory", "key": "team-humans", "value": "<encrypted ≤882 bytes>", "user": "XHG33...", "created": "2026-05-18T22:00:00Z" }Envelope shape (multi-chunk — new)
{ "type": "permanent-memory", "key": "team-interaction-guide", "value": "<encrypted chunk N>", "user": "XHG33...", "created": "2026-05-18T22:00:00Z", "book": "team-interaction-guide", "page": 2, "total": 4 }Tests (19, all pass)
test/chunking.test.ts— boundary cases (exactlyMAX_CLEARTEXT_PER_CHUNKbytes, +1 byte), round-trip preservation including emoji + CJK + RTL,needsChunkingheuristic.test/permanent-reassemble.test.ts— legacy single-chunk pass-through, multi-page join in order, missing-page drop, two-save dedup with differentcreatedtimestamps, contiguous-page enforcement, mixed single+multi handling.Test plan
bun test(26/26 across 3 files)totalcap is needed (e.g. reject saves > 100 chunks to bound recall cost)🤖 Generated with Claude Code