feat(permanent): chunking for values > 600 bytes (book/page envelope) by corvid-agent · Pull Request #8 · CorvidLabs/fledge-plugin-memory

corvid-agent · 2026-05-18T22:55:22Z

Summary

fledge-plugin-memory previously rejected any permanent save whose encrypted envelope exceeded ~882 bytes (the @corvidlabs/ts-algochat hard cap), or whose tx note exceeded 1024 bytes (Algorand's per-tx limit). Real-world team-knowledge memories regularly hit these limits — a bulk import of 1,034 corvid-agent memories saw ~37% drop on this basis.

This adds transparent multi-tx chunking on the permanent tier. Mutable (ARC-69 ASA) chunking is intentionally deferred — single PR keeps the review small.

How it works

New src/chunking.ts splits values into ≤600-byte UTF-8-safe chunks. Multi-byte codepoints (emoji, CJK, RTL) are never cut mid-character — the splitter walks back to the prior leading byte on a continuation cut.
permanentSave emits N transactions for an N-chunk value. Each carries the same key + ISO-8601 created timestamp plus envelope fields book (= key today), page (1..N), total (N). Single-chunk saves stay on the legacy envelope shape so existing readers and indexers see no change.
permanentRecall / permanentList group by (key, created) inside a new reassemble step, require all total pages to be present (saves with missing pages are silently dropped, never partially reconstructed), and join in page order.
Tombstones (permanentDelete) cover all pages for a key without needing per-page tombstones — the latest-round-wins rule already picks the tombstone over the older multi-page write.

Envelope shape (single chunk — unchanged)

{
  "type": "permanent-memory",
  "key": "team-humans",
  "value": "<encrypted ≤882 bytes>",
  "user": "XHG33...",
  "created": "2026-05-18T22:00:00Z"
}

Envelope shape (multi-chunk — new)

{
  "type": "permanent-memory",
  "key": "team-interaction-guide",
  "value": "<encrypted chunk N>",
  "user": "XHG33...",
  "created": "2026-05-18T22:00:00Z",
  "book": "team-interaction-guide",
  "page": 2,
  "total": 4
}

Tests (19, all pass)

test/chunking.test.ts — boundary cases (exactly MAX_CLEARTEXT_PER_CHUNK bytes, +1 byte), round-trip preservation including emoji + CJK + RTL, needsChunking heuristic.
test/permanent-reassemble.test.ts — legacy single-chunk pass-through, multi-page join in order, missing-page drop, two-save dedup with different created timestamps, contiguous-page enforcement, mixed single+multi handling.

Test plan

bun test (26/26 across 3 files)
E2E save+recall of a 3030-byte payload against live localnet (running in parallel with this PR — will update if it surfaces anything)
Round-trip preserves UTF-8 with emoji and CJK
Single-chunk legacy envelope shape unchanged
Reviewer to consider whether total cap is needed (e.g. reject saves > 100 chunks to bound recall cost)

🤖 Generated with Claude Code

`fledge-plugin-memory` previously rejected any permanent save whose encrypted envelope exceeded ~882 bytes (the @corvidlabs/ts-algochat hard cap), and any tx note > 1024 bytes (Algorand's per-tx note limit). For larger content the user got "Permanent value too large for tx note" or "EncryptionError: Message too large". This change adds transparent multi-tx chunking on the permanent tier: - New `src/chunking.ts` — splits values into ≤600-byte UTF-8-safe chunks. UTF-8 multi-byte codepoints are never cut mid-character; the splitter walks back to the prior leading byte when a cut lands on a continuation byte (`0b10xxxxxx`). - `permanentSave` now emits N transactions for an N-chunk value. Each carries the same key + ISO-8601 `created` timestamp plus envelope fields `book` (= key today), `page` (1..N), `total` (N). Single- chunk saves stay on the legacy envelope shape so existing readers and indexers see no change. - `permanentRecall` / `permanentList` group by (key, created) inside a new `reassemble` step, require all `total` pages to be present (a save with missing pages is silently dropped, not partially reassembled), and join in page order. Tombstones cover all pages for a key without needing per-page tombstones because the latest-round-wins rule still picks the tombstone over the older multi-page write. - `permanentDelete` is unchanged — a single tombstone tx covers any number of chunks under that key. 19 unit tests: - `test/chunking.test.ts` — boundary cases (exactly N bytes, N+1 bytes), round-trip preservation including emoji + CJK + RTL, needsChunking heuristic. - `test/permanent-reassemble.test.ts` — legacy single-chunk pass- through, multi-page join, missing-page drop, two-save dedup, contiguous-page enforcement, mixed single+multi handling. Mutable (ARC-69 ASA) tier chunking is a follow-up — single PR is enough surface to review at once. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request implements a chunking system for Algorand transaction notes, which are limited to 1024 bytes. It introduces logic to split strings into UTF-8-safe segments and reassemble them, along with comprehensive unit tests for various character types and boundary conditions. A potential infinite loop was identified in the chunkValue function if the maximum chunk size is configured to be smaller than a single multi-byte character, and a fix was suggested to ensure the loop always progresses.

gemini-code-assist · 2026-05-18T22:57:11Z

+  while (offset < bytes.length) {
+    let end = Math.min(offset + MAX_CLEARTEXT_PER_CHUNK, bytes.length);
+    // Walk back if we landed in the middle of a UTF-8 continuation byte
+    // (0b10xxxxxx, i.e. (byte & 0xC0) === 0x80). We stop walking once
+    // we hit a leading byte; this caps regression at 3 bytes.
+    while (end < bytes.length && (bytes[end] & 0xc0) === 0x80) end--;
+    chunks.push(bytes.slice(offset, end).toString("utf-8"));
+    offset = end;
+  }


The chunkValue function is susceptible to an infinite loop if MAX_CLEARTEXT_PER_CHUNK is set to a value smaller than the byte length of a single UTF-8 character (e.g., if it were reduced to 2 for testing or future changes). If end walks back all the way to offset, the offset will never advance. While 600 bytes is plenty for any valid UTF-8 character (max 4 bytes), adding a guard ensures robustness against configuration changes or malformed input.

while (offset < bytes.length) { let end = Math.min(offset + MAX_CLEARTEXT_PER_CHUNK, bytes.length); // Walk back if we landed in the middle of a UTF-8 continuation byte // (0b10xxxxxx, i.e. (byte & 0xC0) === 0x80). We stop walking once // we hit a leading byte; this caps regression at 3 bytes. while (end > offset && end < bytes.length && (bytes[end] & 0xc0) === 0x80) end--; // If the chunk size is too small to fit even one full character, // force advance to avoid an infinite loop. if (end === offset) end = Math.min(offset + MAX_CLEARTEXT_PER_CHUNK, bytes.length); chunks.push(bytes.slice(offset, end).toString("utf-8")); offset = end; }

) PR #8 set MAX_CLEARTEXT_PER_CHUNK = 600 based on a too-loose estimate of envelope overhead (~150 bytes). The actual envelope is: {"type":"permanent-memory","key":"K","value":"<base64>","user":"<58>","created":"<24>","book":"K","page":N,"total":M} Breaking it down: - JSON syntax + field names: ~100 bytes - key (variable): up to 256 chars per validateKey - value (base64-encrypted): plaintext * 4/3 + 40 (nonce+MAC) bytes - user (Algorand address): 58 chars - created (ISO-8601): 24 chars - book — duplicates key: counted twice - page + total integers: up to 12 chars For a 30-char key, the value blob has 770 base64 chars of room (577 binary, 537 plaintext). For a 100-char key it shrinks to 432 plaintext. The empirical failure: re-importing the 393 dropped corvid-agent memories at 600 plaintext per chunk produced envelopes of 1235 bytes — over Algorand's 1024-byte note cap. `permanentSave`'s post-chunking assertion correctly fired: Internal: permanent envelope exceeded 1024 bytes (1235) after chunking. Raise MAX_CLEARTEXT_PER_CHUNK headroom (That assertion was added in #8 precisely so this kind of regression becomes loud instead of silent.) ## What this changes - `MAX_CLEARTEXT_PER_CHUNK`: 600 → **400**. Safe for keys up to ~120 chars; longer keys (rare; observed max in corvid-agent's 1,000+ keyspace is ~60) may still trip the assertion but won't silently corrupt. - Module docstring rewritten with the explicit byte budget so the next reader can re-derive a sound value when the envelope shape changes. - Loosened the "3000 bytes → 5 chunks" test to count chunks via `Math.ceil(total / MAX)` so it tracks the constant. ## New regression tests `envelope-fits invariant` (3 tests): simulate the on-chain envelope size for 30 / 60 / 100 char keys with a chunk at `MAX_CLEARTEXT_PER_CHUNK` and assert each lands under 1024 bytes. The 100-char test would have caught this bug at MAX=600 — and did catch the intermediate MAX=480 proposal during this fix's own iteration. ## Verified end-to-end against live localnet Hot-patched the installed plugin and re-tried two real failures: corvid-agent-build-queue-2025 (1133 B) → 4GQY4G3VNQCM... ✓ corvid-agent-council-2026-02-04 (2072 B) → JEWX62ESYD3T... ✓ Both wrote successfully as multi-chunk permanent ARC-69 / tx-note records. 29/29 unit tests pass. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

gemini-code-assist Bot reviewed May 18, 2026

View reviewed changes

0xLeif merged commit c9226d5 into main May 18, 2026
5 checks passed

corvid-agent mentioned this pull request May 19, 2026

fix(chunking): drop MAX_CLEARTEXT_PER_CHUNK 600 → 400 (envelope cap) #9

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(permanent): chunking for values > 600 bytes (book/page envelope)#8

feat(permanent): chunking for values > 600 bytes (book/page envelope)#8
0xLeif merged 1 commit into
mainfrom
feat/book-page-chunking

corvid-agent commented May 18, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

corvid-agent commented May 18, 2026

Summary

How it works

Envelope shape (single chunk — unchanged)

Envelope shape (multi-chunk — new)

Tests (19, all pass)

Test plan

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants