Skip to content

feat(permanent): chunking for values > 600 bytes (book/page envelope)#8

Merged
0xLeif merged 1 commit into
mainfrom
feat/book-page-chunking
May 18, 2026
Merged

feat(permanent): chunking for values > 600 bytes (book/page envelope)#8
0xLeif merged 1 commit into
mainfrom
feat/book-page-chunking

Conversation

@corvid-agent
Copy link
Copy Markdown
Collaborator

Summary

fledge-plugin-memory previously rejected any permanent save whose encrypted envelope exceeded ~882 bytes (the @corvidlabs/ts-algochat hard cap), or whose tx note exceeded 1024 bytes (Algorand's per-tx limit). Real-world team-knowledge memories regularly hit these limits — a bulk import of 1,034 corvid-agent memories saw ~37% drop on this basis.

This adds transparent multi-tx chunking on the permanent tier. Mutable (ARC-69 ASA) chunking is intentionally deferred — single PR keeps the review small.

How it works

  • New src/chunking.ts splits values into ≤600-byte UTF-8-safe chunks. Multi-byte codepoints (emoji, CJK, RTL) are never cut mid-character — the splitter walks back to the prior leading byte on a continuation cut.
  • permanentSave emits N transactions for an N-chunk value. Each carries the same key + ISO-8601 created timestamp plus envelope fields book (= key today), page (1..N), total (N). Single-chunk saves stay on the legacy envelope shape so existing readers and indexers see no change.
  • permanentRecall / permanentList group by (key, created) inside a new reassemble step, require all total pages to be present (saves with missing pages are silently dropped, never partially reconstructed), and join in page order.
  • Tombstones (permanentDelete) cover all pages for a key without needing per-page tombstones — the latest-round-wins rule already picks the tombstone over the older multi-page write.

Envelope shape (single chunk — unchanged)

{
  "type": "permanent-memory",
  "key": "team-humans",
  "value": "<encrypted ≤882 bytes>",
  "user": "XHG33...",
  "created": "2026-05-18T22:00:00Z"
}

Envelope shape (multi-chunk — new)

{
  "type": "permanent-memory",
  "key": "team-interaction-guide",
  "value": "<encrypted chunk N>",
  "user": "XHG33...",
  "created": "2026-05-18T22:00:00Z",
  "book": "team-interaction-guide",
  "page": 2,
  "total": 4
}

Tests (19, all pass)

  • test/chunking.test.ts — boundary cases (exactly MAX_CLEARTEXT_PER_CHUNK bytes, +1 byte), round-trip preservation including emoji + CJK + RTL, needsChunking heuristic.
  • test/permanent-reassemble.test.ts — legacy single-chunk pass-through, multi-page join in order, missing-page drop, two-save dedup with different created timestamps, contiguous-page enforcement, mixed single+multi handling.

Test plan

  • bun test (26/26 across 3 files)
  • E2E save+recall of a 3030-byte payload against live localnet (running in parallel with this PR — will update if it surfaces anything)
  • Round-trip preserves UTF-8 with emoji and CJK
  • Single-chunk legacy envelope shape unchanged
  • Reviewer to consider whether total cap is needed (e.g. reject saves > 100 chunks to bound recall cost)

🤖 Generated with Claude Code

`fledge-plugin-memory` previously rejected any permanent save whose
encrypted envelope exceeded ~882 bytes (the @corvidlabs/ts-algochat
hard cap), and any tx note > 1024 bytes (Algorand's per-tx note limit).
For larger content the user got "Permanent value too large for tx note"
or "EncryptionError: Message too large".

This change adds transparent multi-tx chunking on the permanent tier:

- New `src/chunking.ts` — splits values into ≤600-byte UTF-8-safe
  chunks. UTF-8 multi-byte codepoints are never cut mid-character;
  the splitter walks back to the prior leading byte when a cut lands
  on a continuation byte (`0b10xxxxxx`).

- `permanentSave` now emits N transactions for an N-chunk value. Each
  carries the same key + ISO-8601 `created` timestamp plus envelope
  fields `book` (= key today), `page` (1..N), `total` (N). Single-
  chunk saves stay on the legacy envelope shape so existing readers
  and indexers see no change.

- `permanentRecall` / `permanentList` group by (key, created) inside
  a new `reassemble` step, require all `total` pages to be present
  (a save with missing pages is silently dropped, not partially
  reassembled), and join in page order. Tombstones cover all pages
  for a key without needing per-page tombstones because the
  latest-round-wins rule still picks the tombstone over the older
  multi-page write.

- `permanentDelete` is unchanged — a single tombstone tx covers any
  number of chunks under that key.

19 unit tests:
- `test/chunking.test.ts` — boundary cases (exactly N bytes, N+1
  bytes), round-trip preservation including emoji + CJK + RTL,
  needsChunking heuristic.
- `test/permanent-reassemble.test.ts` — legacy single-chunk pass-
  through, multi-page join, missing-page drop, two-save dedup,
  contiguous-page enforcement, mixed single+multi handling.

Mutable (ARC-69 ASA) tier chunking is a follow-up — single PR is
enough surface to review at once.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements a chunking system for Algorand transaction notes, which are limited to 1024 bytes. It introduces logic to split strings into UTF-8-safe segments and reassemble them, along with comprehensive unit tests for various character types and boundary conditions. A potential infinite loop was identified in the chunkValue function if the maximum chunk size is configured to be smaller than a single multi-byte character, and a fix was suggested to ensure the loop always progresses.

Comment thread src/chunking.ts
Comment on lines +37 to +45
while (offset < bytes.length) {
let end = Math.min(offset + MAX_CLEARTEXT_PER_CHUNK, bytes.length);
// Walk back if we landed in the middle of a UTF-8 continuation byte
// (0b10xxxxxx, i.e. (byte & 0xC0) === 0x80). We stop walking once
// we hit a leading byte; this caps regression at 3 bytes.
while (end < bytes.length && (bytes[end] & 0xc0) === 0x80) end--;
chunks.push(bytes.slice(offset, end).toString("utf-8"));
offset = end;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The chunkValue function is susceptible to an infinite loop if MAX_CLEARTEXT_PER_CHUNK is set to a value smaller than the byte length of a single UTF-8 character (e.g., if it were reduced to 2 for testing or future changes). If end walks back all the way to offset, the offset will never advance. While 600 bytes is plenty for any valid UTF-8 character (max 4 bytes), adding a guard ensures robustness against configuration changes or malformed input.

  while (offset < bytes.length) {
    let end = Math.min(offset + MAX_CLEARTEXT_PER_CHUNK, bytes.length);
    // Walk back if we landed in the middle of a UTF-8 continuation byte
    // (0b10xxxxxx, i.e. (byte & 0xC0) === 0x80). We stop walking once
    // we hit a leading byte; this caps regression at 3 bytes.
    while (end > offset && end < bytes.length && (bytes[end] & 0xc0) === 0x80) end--;
    
    // If the chunk size is too small to fit even one full character, 
    // force advance to avoid an infinite loop.
    if (end === offset) end = Math.min(offset + MAX_CLEARTEXT_PER_CHUNK, bytes.length);

    chunks.push(bytes.slice(offset, end).toString("utf-8"));
    offset = end;
  }

@0xLeif 0xLeif merged commit c9226d5 into main May 18, 2026
5 checks passed
corvid-agent added a commit that referenced this pull request May 19, 2026
)

PR #8 set MAX_CLEARTEXT_PER_CHUNK = 600 based on a too-loose estimate
of envelope overhead (~150 bytes). The actual envelope is:

  {"type":"permanent-memory","key":"K","value":"<base64>","user":"<58>","created":"<24>","book":"K","page":N,"total":M}

Breaking it down:
- JSON syntax + field names:  ~100 bytes
- key (variable):              up to 256 chars per validateKey
- value (base64-encrypted):    plaintext * 4/3 + 40 (nonce+MAC) bytes
- user (Algorand address):     58 chars
- created (ISO-8601):          24 chars
- book — duplicates key:       counted twice
- page + total integers:       up to 12 chars

For a 30-char key, the value blob has 770 base64 chars of room
(577 binary, 537 plaintext). For a 100-char key it shrinks to 432
plaintext.

The empirical failure: re-importing the 393 dropped corvid-agent
memories at 600 plaintext per chunk produced envelopes of 1235
bytes — over Algorand's 1024-byte note cap. `permanentSave`'s
post-chunking assertion correctly fired:

  Internal: permanent envelope exceeded 1024 bytes (1235)
  after chunking. Raise MAX_CLEARTEXT_PER_CHUNK headroom

(That assertion was added in #8 precisely so this kind of regression
becomes loud instead of silent.)

## What this changes

- `MAX_CLEARTEXT_PER_CHUNK`: 600 → **400**. Safe for keys up to
  ~120 chars; longer keys (rare; observed max in corvid-agent's
  1,000+ keyspace is ~60) may still trip the assertion but won't
  silently corrupt.
- Module docstring rewritten with the explicit byte budget so the
  next reader can re-derive a sound value when the envelope shape
  changes.
- Loosened the "3000 bytes → 5 chunks" test to count chunks via
  `Math.ceil(total / MAX)` so it tracks the constant.

## New regression tests

`envelope-fits invariant` (3 tests): simulate the on-chain envelope
size for 30 / 60 / 100 char keys with a chunk at `MAX_CLEARTEXT_PER_CHUNK`
and assert each lands under 1024 bytes. The 100-char test would have
caught this bug at MAX=600 — and did catch the intermediate MAX=480
proposal during this fix's own iteration.

## Verified end-to-end against live localnet

Hot-patched the installed plugin and re-tried two real failures:

  corvid-agent-build-queue-2025      (1133 B) → 4GQY4G3VNQCM... ✓
  corvid-agent-council-2026-02-04    (2072 B) → JEWX62ESYD3T... ✓

Both wrote successfully as multi-chunk permanent ARC-69 / tx-note
records.

29/29 unit tests pass.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants