Skip to content

perf+ratio(encoder): block splitter donor-parity check vs ZSTD_splitBlock #206

@polaz

Description

@polaz

Context

Donor ZSTD_splitBlock (lib/compress/zstd_compress_superblock.c) decides
whether to split a literals/sequences block at intermediate boundaries
to improve ratio. Our block_splitter::should_split uses a different
heuristic and may pick different boundaries than donor on the same input.

Goal

Build a Rust↔FFI boundary comparator that runs both block splitters on
the same input and reports:

  1. For each input scenario, list of boundaries chosen by Rust vs donor
  2. Per-scenario divergence count and total compressed-size delta
  3. Verify: where Rust splits differently AND donor produces smaller
    output → port the donor decision logic

Acceptance criteria

  • Comparator harness in tests/ or benches/ reading decodecorpus_files/
  • Report shows boundary-by-boundary diff for each scenario
  • If divergences correlate with ratio loss → port the donor split logic
  • If divergences are neutral (same compressed size) → close as algorithmic
    freedom

Files involved

  • zstd/src/encoding/block_splitter.rs (current heuristic)
  • New harness in zstd/tests/ or zstd/benches/

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2-mediumMedium priority — important improvementenhancementNew feature or requestperformancePerformance optimization

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions