Skip to content

perf: Improve Huffman leaf sorting with bucket sort#244

Merged
hellobertrand merged 3 commits into
mainfrom
perf/huf-bucket-sort
May 27, 2026
Merged

perf: Improve Huffman leaf sorting with bucket sort#244
hellobertrand merged 3 commits into
mainfrom
perf/huf-bucket-sort

Conversation

@hellobertrand
Copy link
Copy Markdown
Owner

Replaces the generic qsort with a specialized bucket sort and insertion sort hybrid for pm_leaf_t arrays. The previous qsort's indirect comparator calls were a performance bottleneck. This custom implementation exploits the natural clustering of frequency weights, leading to short intra-bucket lists where insertion sort is branch-friendly and significantly faster.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 24, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@hellobertrand hellobertrand force-pushed the perf/huf-bucket-sort branch from ce4d8dd to bbef4a7 Compare May 25, 2026 16:06
Comment thread src/lib/zxc_compress.c Fixed
@hellobertrand hellobertrand force-pushed the perf/huf-bucket-sort branch 2 times, most recently from 6cfa267 to cd3c97a Compare May 25, 2026 16:53
Replaces the generic `qsort` with a specialized bucket sort and insertion sort hybrid for `pm_leaf_t` arrays. The previous `qsort`'s indirect comparator calls were a performance bottleneck. This custom implementation exploits the natural clustering of frequency weights, leading to short intra-bucket lists where insertion sort is branch-friendly and significantly faster.
The `ZXC_QSORT` macro, previously used to abstract the standard library `qsort`, is no longer required. It was superseded by a specialized bucket sort implementation for Huffman leaf arrays, making the generic `qsort` abstraction redundant.
@hellobertrand hellobertrand force-pushed the perf/huf-bucket-sort branch from 96d1a02 to 1efaa9b Compare May 25, 2026 17:10
Enhances the NEON64 path within `zxc_lz77_find_best_match` to process 32 bytes per iteration. This is achieved by performing two 16-byte vector comparisons within the loop, reducing overhead and improving the speed of extending LZ77 matches. The NEON32 logic is also explicitly separated into its own block.
@hellobertrand hellobertrand force-pushed the perf/huf-bucket-sort branch from 1efaa9b to d83bd07 Compare May 27, 2026 07:25
@hellobertrand hellobertrand merged commit 56e9e2f into main May 27, 2026
79 checks passed
@hellobertrand hellobertrand deleted the perf/huf-bucket-sort branch May 27, 2026 07:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants