Skip to content

fix(chunking): drop MAX_CLEARTEXT_PER_CHUNK 600 → 400 (envelope cap)#9

Merged
corvid-agent merged 1 commit into
mainfrom
fix/chunking-headroom
May 19, 2026
Merged

fix(chunking): drop MAX_CLEARTEXT_PER_CHUNK 600 → 400 (envelope cap)#9
corvid-agent merged 1 commit into
mainfrom
fix/chunking-headroom

Conversation

@corvid-agent
Copy link
Copy Markdown
Collaborator

Follow-up to #8 — closes a real bug surfaced during the re-import of corvid-agent memories that triggered the chunking path.

The bug

PR #8 set `MAX_CLEARTEXT_PER_CHUNK = 600` based on an underestimate of envelope overhead. The actual envelope is:

```json
{"type":"permanent-memory","key":"K","value":"","user":"<58>","created":"<24>","book":"K","page":N,"total":M}
```

For a 30-char key the value blob has 770 base64 chars of room (≈ 537 plaintext after the 40-byte crypto envelope). The previous 600 plaintext per chunk produced envelopes of 1235 bytes — 211 over Algorand's 1024 note cap. `permanentSave`'s post-chunking assertion correctly fired (that assertion was added in #8 precisely so this kind of regression becomes loud, not silent):

```
Internal: permanent envelope exceeded 1024 bytes (1235) after chunking.
Raise MAX_CLEARTEXT_PER_CHUNK headroom in chunking.ts.
```

What this changes

  • `MAX_CLEARTEXT_PER_CHUNK`: 600 → 400. Safe for keys up to ~120 chars; longer keys (rare — observed max in corvid-agent's 1,000+ memory keyspace is ~60) may still trip the assertion but won't silently corrupt.
  • Module docstring rewritten with the explicit byte budget so the next reader can re-derive a sound value when the envelope shape changes.
  • Loosened the "3000 bytes → 5 chunks" test to count chunks via `Math.ceil(total / MAX)` so it tracks the constant.

New regression tests (3)

`envelope-fits invariant`: simulate the on-chain envelope size for 30 / 60 / 100 char keys with a chunk at `MAX_CLEARTEXT_PER_CHUNK` and assert each lands under 1024 bytes. The 100-char test would have caught this bug at MAX=600 — and did catch an intermediate MAX=480 proposal during this fix's iteration.

Verified end-to-end against live localnet

Hot-patched the installed plugin and re-tried two real failures from the corvid-agent import:

Key Size Result
`corvid-agent-build-queue-2025` 1133 B `4GQY4G3VNQCM...` ✓
`corvid-agent-council-2026-02-04` 2072 B `JEWX62ESYD3T...` ✓

Both wrote successfully as multi-chunk permanent ARC-69 / tx-note records.

Test plan

  • `bun test` → 29/29 pass
  • E2E hot-patch test of two real previously-failing entries
  • After merge: re-run the 393 dropped imports — should now land permanent

🤖 Generated with Claude Code

PR #8 set MAX_CLEARTEXT_PER_CHUNK = 600 based on a too-loose estimate
of envelope overhead (~150 bytes). The actual envelope is:

  {"type":"permanent-memory","key":"K","value":"<base64>","user":"<58>","created":"<24>","book":"K","page":N,"total":M}

Breaking it down:
- JSON syntax + field names:  ~100 bytes
- key (variable):              up to 256 chars per validateKey
- value (base64-encrypted):    plaintext * 4/3 + 40 (nonce+MAC) bytes
- user (Algorand address):     58 chars
- created (ISO-8601):          24 chars
- book — duplicates key:       counted twice
- page + total integers:       up to 12 chars

For a 30-char key, the value blob has 770 base64 chars of room
(577 binary, 537 plaintext). For a 100-char key it shrinks to 432
plaintext.

The empirical failure: re-importing the 393 dropped corvid-agent
memories at 600 plaintext per chunk produced envelopes of 1235
bytes — over Algorand's 1024-byte note cap. `permanentSave`'s
post-chunking assertion correctly fired:

  Internal: permanent envelope exceeded 1024 bytes (1235)
  after chunking. Raise MAX_CLEARTEXT_PER_CHUNK headroom

(That assertion was added in #8 precisely so this kind of regression
becomes loud instead of silent.)

## What this changes

- `MAX_CLEARTEXT_PER_CHUNK`: 600 → **400**. Safe for keys up to
  ~120 chars; longer keys (rare; observed max in corvid-agent's
  1,000+ keyspace is ~60) may still trip the assertion but won't
  silently corrupt.
- Module docstring rewritten with the explicit byte budget so the
  next reader can re-derive a sound value when the envelope shape
  changes.
- Loosened the "3000 bytes → 5 chunks" test to count chunks via
  `Math.ceil(total / MAX)` so it tracks the constant.

## New regression tests

`envelope-fits invariant` (3 tests): simulate the on-chain envelope
size for 30 / 60 / 100 char keys with a chunk at `MAX_CLEARTEXT_PER_CHUNK`
and assert each lands under 1024 bytes. The 100-char test would have
caught this bug at MAX=600 — and did catch the intermediate MAX=480
proposal during this fix's own iteration.

## Verified end-to-end against live localnet

Hot-patched the installed plugin and re-tried two real failures:

  corvid-agent-build-queue-2025      (1133 B) → 4GQY4G3VNQCM... ✓
  corvid-agent-council-2026-02-04    (2072 B) → JEWX62ESYD3T... ✓

Both wrote successfully as multi-chunk permanent ARC-69 / tx-note
records.

29/29 unit tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request reduces the MAX_CLEARTEXT_PER_CHUNK constant from 600 to 400 to ensure that encrypted data chunks, when wrapped in their JSON envelopes, consistently fit within Algorand's 1024-byte transaction note limit. The changes include a detailed derivation of the new limit in the module documentation and an updated test suite that simulates envelope sizes for various key lengths. A review comment correctly identifies a typo in the documentation where the constant was referred to as 480 instead of 400.

Comment thread src/chunking.ts
* max binary = 770 * 3/4 = 577 bytes
* max plaintext = 577 - 40 = 537 bytes
*
* `MAX_CLEARTEXT_PER_CHUNK = 480` is set conservatively below this
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The docstring mentions MAX_CLEARTEXT_PER_CHUNK = 480, but the constant is actually defined as 400 on line 51. This inconsistency should be corrected to match the implementation and the PR description.

Suggested change
* `MAX_CLEARTEXT_PER_CHUNK = 480` is set conservatively below this
* `MAX_CLEARTEXT_PER_CHUNK = 400` is set conservatively below this

@corvid-agent corvid-agent merged commit 3d2a40b into main May 19, 2026
5 checks passed
@corvid-agent corvid-agent deleted the fix/chunking-headroom branch May 19, 2026 00:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant