Skip to content

First part of a HashSortedMap#107

Merged
aneubeck merged 24 commits intomainfrom
aneubeck/prefixmap
May 6, 2026
Merged

First part of a HashSortedMap#107
aneubeck merged 24 commits intomainfrom
aneubeck/prefixmap

Conversation

@aneubeck
Copy link
Copy Markdown
Collaborator

In a subsequent step, merging functions and sorted iterators will be added.

Copilot AI review requested due to automatic review settings April 29, 2026 13:25
@aneubeck aneubeck requested a review from a team as a code owner April 29, 2026 13:25
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces a new hash-sorted-map crate implementing the first part of an insertion-only HashSortedMap with SIMD-accelerated group scanning and overflow chaining, plus accompanying benchmarks and design documentation.

Changes:

  • Adds core HashSortedMap implementation (insert/get/entry API) with growth and overflow chaining.
  • Adds platform-specific group scanning primitives (SSE2/NEON/scalar) and a basic test suite.
  • Adds a Criterion benchmarks crate and initial README/optimization docs.
Show a summary per file
File Description
crates/hash-sorted-map/src/lib.rs Exposes group_ops and hash_sorted_map modules.
crates/hash-sorted-map/src/hash_sorted_map.rs Implements the HashSortedMap data structure, growth logic, and Entry API, plus tests.
crates/hash-sorted-map/src/group_ops.rs Adds SIMD/scalar helpers for ctrl-byte matching and mask iteration.
crates/hash-sorted-map/benchmarks/performance.rs Adds Criterion benchmarks comparing multiple map implementations.
crates/hash-sorted-map/benchmarks/lib.rs Benchmark utilities (trigram generation, identity hasher).
crates/hash-sorted-map/benchmarks/Cargo.toml Defines the benchmarks crate and its dependencies.
crates/hash-sorted-map/README.md Documents motivation/design and includes benchmark results + running instructions.
crates/hash-sorted-map/OPTIMIZATIONS.md Design/optimization analysis and rationale.
crates/hash-sorted-map/Cargo.toml Declares the new hash-sorted-map crate metadata.
Cargo.toml Adds the benchmarks crate to the workspace members.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comments suppressed due to low confidence (3)

crates/hash-sorted-map/src/hash_sorted_map.rs:560

  • insert_after_grow treats Insertion::NeedsOverflow as unreachable after a resize, but it can still happen for adversarial / extremely-colliding hash distributions (the table may still need overflow groups, even immediately after growing). Instead of unreachable!, handle NeedsOverflow by allocating/linking an overflow group (with the same capacity check as the normal insert path) or by retrying via the normal insert_hashed logic.
        // After grow, the new primary group for `key` cannot be full (see
        // function docs), and the key wasn't in the table before grow.
        FindResult::Vacant(Insertion::NeedsOverflow { .. }) | FindResult::Found(_) => {
            unreachable!("post-grow walk must hit an empty slot")
        }

crates/hash-sorted-map/README.md:90

  • The benchmark invocation doesn’t match the benchmark that’s defined in benchmarks/Cargo.toml ([[bench]] name = "performance"). cargo bench --bench hashmap_insert will fail; update the README to the correct command (likely cargo bench -p hash-sorted-map-benchmarks --bench performance, or run from the benchmarks/ crate).

```sh
cargo bench --bench hashmap_insert
**crates/hash-sorted-map/src/hash_sorted_map.rs:367**
* `insert_for_grow` allocates overflow groups without checking `self.num_groups` against `self.groups.len()`. Under heavy hash collisions during rehash, `new_gi` can reach `self.groups.len()` and the subsequent indexing will panic. Consider mirroring the capacity check used in `insert_hashed` (e.g., trigger another `grow()` / allocate more groups) so growth remains panic-free even for pathological hash distributions.
        } else {
            let new_gi = self.num_groups as usize;
            self.groups[gi].overflow = new_gi as u32;
            self.num_groups += 1;
            group = &mut self.groups[new_gi];
            break;
</details>


- **Files reviewed:** 10/10 changed files
- **Comments generated:** 3


Comment thread crates/hash-sorted-map/OPTIMIZATIONS.md
Comment thread crates/hash-sorted-map/benchmarks/Cargo.toml
Comment thread crates/hash-sorted-map/src/hash_sorted_map.rs Outdated
Copy link
Copy Markdown
Contributor

@jorendorff jorendorff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cool.

Can I see the merge/iteration operations you have in mind, before stamping this?

Comment thread crates/hash-sorted-map/src/lib.rs Outdated
Comment thread crates/hash-sorted-map/README.md
Comment thread crates/hash-sorted-map/src/hash_sorted_map.rs Outdated
Co-authored-by: Jason Orendorff <jorendorff@github.com>
@aneubeck
Copy link
Copy Markdown
Collaborator Author

Here is the very simple sorting of each group:
https://github.com/github/blackbird/blob/aneubeck/reindex4/crates/index/src/indexer/prefix_map.rs#L288
and here the sorted iterator:
https://github.com/github/blackbird/blob/aneubeck/reindex4/crates/index/src/indexer/prefix_map.rs#L390

In the ported version, we will need to introduce a new type, since the hash map will be broken after the sorting.
But one might want to iterate multiple times.

Merging is implemented here:
https://github.com/github/blackbird/blob/aneubeck/reindex4/crates/index/src/indexer/memory_index.rs#L191-L204

We could provide a helper function in the rust crate for this.

Feel free to merge/make changes, since I will be OOO tomorrow.

Comment thread crates/hash-sorted-map/src/hash_sorted_map.rs Outdated
Comment thread crates/hash-sorted-map/src/hash_sorted_map.rs Outdated
Comment thread crates/hash-sorted-map/src/hash_sorted_map.rs Outdated
Comment thread crates/hash-sorted-map/OPTIMIZATIONS.md Outdated
Comment thread crates/hash-sorted-map/OPTIMIZATIONS.md
Comment thread crates/hash-sorted-map/OPTIMIZATIONS.md Outdated
aneubeck and others added 3 commits May 5, 2026 15:23
Co-authored-by: Jason Orendorff <jorendorff@github.com>
Co-authored-by: Jason Orendorff <jorendorff@github.com>
@aneubeck aneubeck enabled auto-merge May 6, 2026 09:57
@aneubeck aneubeck merged commit c276254 into main May 6, 2026
7 checks passed
@aneubeck aneubeck deleted the aneubeck/prefixmap branch May 6, 2026 09:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants