Skip to content

chore(cpp): reorganize codec structure and improve error handling#75

Merged
nyurik merged 1 commit intofast-pack:mainfrom
nyurik:block-size-cpp
Mar 22, 2026
Merged

chore(cpp): reorganize codec structure and improve error handling#75
nyurik merged 1 commit intofast-pack:mainfrom
nyurik:block-size-cpp

Conversation

@nyurik
Copy link
Member

@nyurik nyurik commented Mar 22, 2026

First step of moving towards unified C++ and Rust model of codecs done in #73

@nyurik nyurik requested review from CommanderStorm and Copilot and removed request for Copilot March 22, 2026 09:00
@codecov
Copy link

codecov bot commented Mar 22, 2026

Codecov Report

❌ Patch coverage is 87.93774% with 31 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/codec.rs 12.00% 22 Missing ⚠️
src/cpp/wrappers.rs 89.85% 0 Missing and 7 partials ⚠️
src/helpers.rs 94.11% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

First step of moving towards unified C++ and Rust model of codecs.
Copilot AI review requested due to automatic review settings March 22, 2026 09:13
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reorganizes codec APIs to better align the C++ and Rust codec models by introducing shared codec traits and centralizing error handling at the crate level, while updating tests/benchmarks/fuzzing harnesses to use the new any-length codec interface.

Changes:

  • Introduces crate-level AnyLenCodec / BlockCodec traits (plus BlockCodec64 for select C++ codecs) and shared helpers for bounds/length validation.
  • Refactors C++ codec wrappers to a Vec-based any-length encode/decode API with improved error types.
  • Updates compatibility tests, benchmarks, and fuzz targets to use the new codec interface and adjusts CI/just recipes accordingly.

Reviewed changes

Copilot reviewed 28 out of 28 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
tests/cpp_compat_tests.rs Migrates C++/Rust compatibility tests to the new AnyLenCodec Vec-based API.
tests/benchmark_smoke.rs Updates C++ smoke test to use mutable codec + new decode helper signature.
src/rust/mod.rs Re-exports error types from crate root instead of a Rust-only module.
src/rust/integer_compression/variable_byte.rs Switches AsUsize import to new crate-level helpers module.
src/rust/integer_compression/mod.rs Stops exposing the old integer_compression helpers module.
src/rust/integer_compression/helpers.rs Removes the old helpers module (moved to src/helpers.rs).
src/rust/integer_compression/fastpfor.rs Uses crate-level helpers (bits, greatest_multiple, GetWithErr) and crate-level errors.
src/rust/integer_compression/bitpacking.rs Adds pack/unpack roundtrip tests.
src/rust/error.rs Removes Rust-only error definition (moved to crate root).
src/lib.rs Moves error types to crate root; exposes new codec traits and helpers module.
src/helpers.rs New shared helper utilities (AsUsize, GetWithErr, bits, greatest_multiple).
src/error.rs New crate-level FastPForError with additional C++/length-validation variants.
src/cpp/wrappers.rs Adds FFI helpers for Vec-based encode/decode with expected-length validation.
src/cpp/tests.rs Adds broad C++ codec roundtrip + Default smoke tests.
src/cpp/README.md Removes module-level README (replaced with module docs + crate README usage).
src/cpp/mod.rs Replaces README include with module docs; restructures module into submodules.
src/cpp/codecs.rs New macro-generated C++ codec wrapper types implementing AnyLenCodec (+ BlockCodec64 for some).
src/codec.rs New shared codec trait definitions and slice_to_blocks helper.
README.md Updates public C++ usage example to the new AnyLenCodec API.
justfile Extends checks to fuzz crate; changes CI recipe composition and docs recipe args.
fuzz/justfile Reworks fuzz just recipes (target selection, CI smoke test loop, etc.).
fuzz/fuzz_targets/rust_decompress_oracle.rs Updates C++ oracle compression to use AnyLenCodec encode.
fuzz/fuzz_targets/rust_compress_oracle.rs Updates C++ oracle compression to use AnyLenCodec encode.
fuzz/fuzz_targets/cpp_roundtrip.rs Migrates fuzz roundtrip to Vec-based AnyLenCodec API.
fuzz/fuzz_targets/common.rs Switches boxed C++ codec trait object from old Codec32 to AnyLenCodec.
Cargo.toml Makes thiserror and bytemuck non-optional; adjusts feature wiring accordingly.
benches/fastpfor_benchmark.rs Updates C++ benchmark harness to pass mutable codec into helpers.
benches/bench_utils.rs Refactors C++ bench helpers to use AnyLenCodec Vec-based encode/decode.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +29 to +32
/// fn encode_blocks(&self, blocks: &[[u32; 256]], out: &mut Vec<u32>)
/// -> Result<(), FastPForError> { ... }
/// fn decode_blocks(&self, input: &[u32], expected_len: Option<u32>,
/// out: &mut Vec<u32>) -> Result<usize, FastPForError> { ... }
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The BlockCodec doc example doesn’t match the actual trait signature (e.g., encode_blocks takes &mut self in the trait, and decode_blocks’ parameters/return type differ). This is likely to confuse users copying the example; please update the snippet to reflect the current method signatures.

Suggested change
/// fn encode_blocks(&self, blocks: &[[u32; 256]], out: &mut Vec<u32>)
/// -> Result<(), FastPForError> { ... }
/// fn decode_blocks(&self, input: &[u32], expected_len: Option<u32>,
/// out: &mut Vec<u32>) -> Result<usize, FastPForError> { ... }
///
/// fn encode_blocks(
/// &mut self,
/// blocks: &[Self::Block],
/// out: &mut Vec<u32>,
/// ) -> Result<(), FastPForError> {
/// // implementation here
/// }
///
/// fn decode_blocks(
/// &mut self,
/// input: &[u32],
/// expected_len: Option<u32>,
/// out: &mut Vec<u32>,
/// ) -> Result<usize, FastPForError> {
/// // implementation here
/// }

Copilot uses AI. Check for mistakes.
Comment on lines +92 to +99
/// Codec that supports compressing 64-bit integers into a 32-bit word stream.
///
/// Only three C++ codecs implement this trait: `CppFastPFor128`,
/// `CppFastPFor256`, and `CppVarInt`. For simple use, call
/// `encode64` / `decode64` directly on the struct — no trait import required.
///
/// Import `BlockCodec64` only when writing generic code over multiple codecs
/// that support 64-bit compression.
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BlockCodec64 docs reference codec names that don’t exist in this crate (CppFastPFor128, CppFastPFor256, CppVarInt). This can also trigger broken-intra-doc-link warnings under -D warnings. Update the text to the actual exported type names (e.g. cpp::FastPFor128Codec, cpp::FastPFor256Codec, cpp::VarIntCodec).

Copilot uses AI. Check for mistakes.
Comment on lines +3 to +4
//! All C++ codecs are composite (any-length) and implement [`AnyLenCodec`] only.
//! Codecs marked with `@ 64` also implement [`BlockCodec64`] for 64-bit integers.
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These intra-doc links ([AnyLenCodec], [BlockCodec64]) are unqualified in this module and will likely resolve to nothing, producing broken_intra_doc_links warnings (CI uses -D warnings). Use fully-qualified links like [crate::AnyLenCodec] / [crate::BlockCodec64] (or import them into scope for the module docs). Also, the note about codecs being “marked with @ 64” looks outdated now that the @ 64 marker isn’t present in the current generation macro.

Suggested change
//! All C++ codecs are composite (any-length) and implement [`AnyLenCodec`] only.
//! Codecs marked with `@ 64` also implement [`BlockCodec64`] for 64-bit integers.
//! All C++ codecs are composite (any-length) and implement [`crate::AnyLenCodec`] only.
//! Some codecs also implement [`crate::BlockCodec64`] for 64-bit integers.

Copilot uses AI. Check for mistakes.
// for 64-bit loads/stores and will SIGSEGV on unaligned access). The codec is
// otherwise correct on x86/x86_64 where unaligned access is handled in hardware.
// Tracked upstream; skip on aarch64 until fixed in the submodule.
// #[cfg(not(target_arch = "aarch64"))]
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says Simple8bRleCodec must be skipped on aarch64 due to alignment UB/SIGSEGV, but the #[cfg(not(target_arch = "aarch64"))] guard is commented out. As written, this test will run on ARM64 and can crash the test process. Re-enable the cfg gate (or otherwise exclude this test on aarch64).

Suggested change
// #[cfg(not(target_arch = "aarch64"))]
#[cfg(not(target_arch = "aarch64"))]

Copilot uses AI. Check for mistakes.
@nyurik nyurik merged commit 6ba6e0b into fast-pack:main Mar 22, 2026
19 checks passed
@nyurik nyurik deleted the block-size-cpp branch March 22, 2026 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants