You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Project priority sequence per #28: complete encoder rewrite (#111 incl. #23) → speed/ratio optimizations (#178, #180) → params API (#27) → magicless (#26) → Phase 6 C-ABI / CLI drop-in (#126/#127/#128/#130/#131/#132) → THEN this track. lsm-tree bilateral coordination accepted 2026-05-18 — commitment preserved, but execution defers until drop-in parity ships. Pre-Phase-6 work on this issue will not be scheduled.
⚠️Feature gate (mandatory): all Rust code added by this issue is compiled only when the lsm Cargo feature is enabled (#[cfg(feature = "lsm")] on every new public item — module, struct, enum variant, impl block, function). Feature is default off, opt-in for downstream consumers. Without lsm: build is byte-identical to today, no new public symbols, cdylib from Phase 6 stays strict drop-in for donor libzstd v1.5.7. C FFI surface is unaffected regardless of feature state.
Context
lsm-tree's LSM-T2 (encrypted wire-format encoder/decoder) needs post-AEAD-decrypt validation: after AEAD authenticates the ciphertext and the bytes are decrypted to a raw zstd frame, the wire-format spec mandates two cross-checks against the MetadataPayload fields baked into the AAD:
Dictionary_ID match. The inner zstd frame's dict_id (parsed from its Frame_Header_Descriptor) MUST equal MetadataPayload.DictID. Otherwise an attacker who somehow forges a valid AEAD blob with the wrong dictionary could trigger silent corruption — see §8 of the wire-format draft, "dict substitution" row.
Window_Descriptor match. Inner frame's window descriptor byte MUST equal MetadataPayload.WindowLog. Defeats decompression-bomb attacks where a swapped-in ciphertext claims a much larger window than the encrypted blob originally permitted.
Currently lsm-tree would have to parse the inner frame header itself, read FrameHeader::dictionary_id() and FrameHeader::window_size() exposed at zstd/src/decoding/frame.rs:142, 116, and compare against expectations. That's open-coded validation duplicated at every decryption site. Better: structured-zstd accepts the expectations up front and fails the decode with a typed error if they don't match.
Scope — Rust-side only, behind lsm feature
#[cfg(feature = "lsm")]implFrameDecoder{/// Pin the expected `Dictionary_ID` for the next frame. If set, decode/// fails fast (before any block work) when the parsed frame header's/// `dict_id` does not match. `Some(0)` is treated as "no dictionary/// expected" — the frame's `dict_id` must be absent (or 0). `None`/// (default) disables the check.pubfnexpect_dict_id(&mutself,expected:Option<u32>);/// Pin the expected raw `Window_Descriptor` byte (RFC 8878 §3.1.1.1.2/// layout: `(exp << 3) | mantissa`) for the next frame. If set, decode/// fails fast when the parsed frame header's `window_descriptor` byte/// does not match. `None` (default) disables the check.////// Note: this is byte-exact match, NOT a ceiling. Donor's/// `ZSTD_d_windowLogMax` is a separate ceiling-style limit that lives/// in the Phase 6 C FFI surface (#127); it's a different semantic and/// stays available unconditionally there.pubfnexpect_window_descriptor(&mutself,expected:Option<u8>);}#[cfg(feature = "lsm")]#[non_exhaustive]pubenumFrameDecoderError{// existing variants unchanged ...UnexpectedDictId{expected:Option<u32>,found:Option<u32>,},UnexpectedWindowDescriptor{expected:u8,found:u8,},}
Implementation
Hook in FrameDecoder::init / FrameDecoder::reset (zstd/src/decoding/frame_decoder.rs:103, 131) immediately after frame::read_frame_header returns successfully:
let(frame_header, header_size) = frame::read_frame_header(source)?;#[cfg(feature = "lsm")]{ifletSome(expected) = self.expect_dict_id{let found = frame_header.dictionary_id();if expected != found.unwrap_or(0)/* with Some(0) ↔ None equivalence */{returnErr(FrameDecoderError::UnexpectedDictId{expected:Some(expected), found });}}ifletSome(expected) = self.expect_window_descriptor{let found = frame_header.descriptor.0/* or a dedicated accessor */;// Window_Descriptor only meaningful when single_segment_flag is unset;// when set, donor zstd packs frame_content_size in its place.if !frame_header.descriptor.single_segment_flag(){let found_wd = frame_header.window_descriptor/* accessor needed */;if expected != found_wd {returnErr(FrameDecoderError::UnexpectedWindowDescriptor{ expected,found: found_wd });}}}}
The window_descriptor field on FrameHeader (frame.rs:105) is currently pub(crate) — needs a pub fn window_descriptor(&self) -> Option<u8> accessor (returning None for single-segment frames). Small additive API exposure.
What this is NOT
Not a replacement for AEAD. AEAD authentication still runs first; this gate is for the post-decrypt sanity check that the wire-format spec mandates.
Not a dict_id != 0 required gate. expected = Some(0) explicitly means "no dictionary" — useful for blocks that don't use one.
Why feature-gate behind lsm
Consistent with #171-#175: validates strict drop-in C FFI parity (default-build cdylib has zero new surface). The validation is bespoke to wire-format-with-AAD scenarios; non-lsm consumers don't need it.
Single-segment frame + expect_window_descriptor(Some(_)) → behavior documented (skip check, or fail explicitly — needs design decision in PR).
Validation fires BEFORE any block decode work — no allocation, no XXH64 init, no partial output.
After validation failure, calling init/reset again clears the failed state cleanly.
Without lsm feature: setters absent, error variants absent, build byte-identical to today.
503/503 lib tests pass.
Estimated size
~80 LoC + tests. One small additive accessor (FrameHeader::window_descriptor()), two setters on FrameDecoder, two new error variants, validation block in init/reset.
Phasing
PR-F. Independent of #171/#172/#173/#174/#175. Lands any time before lsm-tree's LSM-T2 (wire-format encoder/decoder) goes live — LSM-T2 uses these setters in its post-AEAD-decrypt validation step. Can ship parallel to PR-A in Phase α of the bilateral phasing.
Consumer (blocked on this issue): lsm-tree #251 — LSM-T2 wire-format encoder/decoder impl with AAD post-decrypt validation (feat(encryption): wire-format encoder/decoder for AAD-bound block (impl of #250) coordinode-lsm-tree#251). The expect_dict_id / expect_window_descriptor setters are called immediately after AEAD-decrypt to enforce the wire-format spec's mandatory cross-checks (defeats dict substitution + decompression bomb swap attacks).
Context
lsm-tree's
LSM-T2(encrypted wire-format encoder/decoder) needs post-AEAD-decrypt validation: after AEAD authenticates the ciphertext and the bytes are decrypted to a raw zstd frame, the wire-format spec mandates two cross-checks against theMetadataPayloadfields baked into the AAD:Dictionary_IDmatch. The inner zstd frame'sdict_id(parsed from itsFrame_Header_Descriptor) MUST equalMetadataPayload.DictID. Otherwise an attacker who somehow forges a valid AEAD blob with the wrong dictionary could trigger silent corruption — see §8 of the wire-format draft, "dict substitution" row.Window_Descriptormatch. Inner frame's window descriptor byte MUST equalMetadataPayload.WindowLog. Defeats decompression-bomb attacks where a swapped-in ciphertext claims a much larger window than the encrypted blob originally permitted.Currently lsm-tree would have to parse the inner frame header itself, read
FrameHeader::dictionary_id()andFrameHeader::window_size()exposed atzstd/src/decoding/frame.rs:142, 116, and compare against expectations. That's open-coded validation duplicated at every decryption site. Better: structured-zstd accepts the expectations up front and fails the decode with a typed error if they don't match.Scope — Rust-side only, behind
lsmfeatureImplementation
Hook in
FrameDecoder::init/FrameDecoder::reset(zstd/src/decoding/frame_decoder.rs:103, 131) immediately afterframe::read_frame_headerreturns successfully:The
window_descriptorfield onFrameHeader(frame.rs:105) is currentlypub(crate)— needs apub fn window_descriptor(&self) -> Option<u8>accessor (returningNonefor single-segment frames). Small additive API exposure.What this is NOT
ZSTD_d_windowLogMaxis a different thing — that lives in feat(c-api): #28 Phase 6.2 — advanced + streaming + dictionary C FFI surface #127's C FFI surface, unconditionally available, separate semantic). This is byte-exact equality.dict_id != 0 requiredgate.expected = Some(0)explicitly means "no dictionary" — useful for blocks that don't use one.Why feature-gate behind
lsmConsistent with #171-#175: validates strict drop-in C FFI parity (default-build cdylib has zero new surface). The validation is bespoke to wire-format-with-AAD scenarios; non-lsm consumers don't need it.
Acceptance criteria
expect_dict_id(Some(42)), decode frame withdict_id = 42→ ok, decodes normally.expect_dict_id(Some(42)), decode frame withdict_id = 43→UnexpectedDictId { expected: Some(42), found: Some(43) }, no bytes decoded.expect_dict_id(Some(42)), decode frame with nodict_id(flag 0) →UnexpectedDictId { expected: Some(42), found: None }.expect_dict_id(Some(0)), decode frame with nodict_id→ ok (Some(0) ↔ None equivalence).expect_dict_id(None)(default), decode anything → no validation, no behavior change.expect_window_descriptor(Some(0x4A)), decode frame with window_descriptor=0x4A → ok.expect_window_descriptor(Some(0x4A)), decode frame with window_descriptor=0x60 →UnexpectedWindowDescriptor { expected: 0x4A, found: 0x60 }.expect_window_descriptor(Some(_))→ behavior documented (skip check, or fail explicitly — needs design decision in PR).init/resetagain clears the failed state cleanly.lsmfeature: setters absent, error variants absent, build byte-identical to today.Estimated size
~80 LoC + tests. One small additive accessor (
FrameHeader::window_descriptor()), two setters onFrameDecoder, two new error variants, validation block ininit/reset.Phasing
PR-F. Independent of #171/#172/#173/#174/#175. Lands any time before lsm-tree's
LSM-T2(wire-format encoder/decoder) goes live —LSM-T2uses these setters in its post-AEAD-decrypt validation step. Can ship parallel to PR-A in Phase α of the bilateral phasing.Related
ZSTD_d_windowLogMaxC FFI knob (zstd.h:1290) is the ceiling-style limit, separate concern, stays in feat(c-api): #28 Phase 6.2 — advanced + streaming + dictionary C FFI surface #127.Bilateral cross-reference
expect_dict_id/expect_window_descriptorsetters are called immediately after AEAD-decrypt to enforce the wire-format spec's mandatory cross-checks (defeats dict substitution + decompression bomb swap attacks).