Skip to content

perf(encoder): Fast L1 +7.43% ratio gap vs donor on decodecorpus-z000033 #220

@polaz

Description

@polaz

Context

Follow-up to phase 3 of #198 (PR #219). With per-level Fast cParams threaded end-to-end and donor-shape 4-cursor kernel, L1 still loses ~7.43% on ratio vs donor C zstd on decodecorpus-z000033.

Current measurements (post-M8, b280b48)

Level rust_bytes ffi_bytes Δ
L1 Fast 612905 570507 +7.43%

Sequence comparator (M7): 57.7% match rate vs donor, 6736 ffi_only seqs (matches donor finds we miss), 2805 rust_only.

What we tried

M7 added donor's missing current0+2 hash insertion after match emits (zstd_fast.c:407). Improved match rate from 43.1% → 57.7% but only saved ~3k bytes (612905 from 615899).

M8 removed RESERVED_PREFIX_BYTES dummy on hypothesis that byte-range alignment was off. Output identical to M7 — hypothesis wrong; positions are aligned with donor in both layouts.

Suspected residual causes

Things NOT investigated:

  • Step doubling cadence (step++ at ip2 >= next_step) timing differences
  • Bitwise vs logical AND in backward-extension bound
  • Repcode-at-ip2 vs explicit-at-ip0 priority on tie
  • Hash 32-bit truncation differences at very small mls values
  • min_literals_to_compress / min_gain interaction at L1 specifically

Acceptance

Close L1 ratio gap to within ±2% of donor on decodecorpus-z000033.

Part of #198 phase 3 follow-up (PR #219 shipped M1-M8 without closing this last gap).

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1-highHigh priority — core functionalityperformancePerformance optimization

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions