Conversation
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Introduces a new hash-sorted-map crate implementing the first part of an insertion-only HashSortedMap with SIMD-accelerated group scanning and overflow chaining, plus accompanying benchmarks and design documentation.
Changes:
- Adds core
HashSortedMapimplementation (insert/get/entry API) with growth and overflow chaining. - Adds platform-specific group scanning primitives (SSE2/NEON/scalar) and a basic test suite.
- Adds a Criterion benchmarks crate and initial README/optimization docs.
Show a summary per file
| File | Description |
|---|---|
| crates/hash-sorted-map/src/lib.rs | Exposes group_ops and hash_sorted_map modules. |
| crates/hash-sorted-map/src/hash_sorted_map.rs | Implements the HashSortedMap data structure, growth logic, and Entry API, plus tests. |
| crates/hash-sorted-map/src/group_ops.rs | Adds SIMD/scalar helpers for ctrl-byte matching and mask iteration. |
| crates/hash-sorted-map/benchmarks/performance.rs | Adds Criterion benchmarks comparing multiple map implementations. |
| crates/hash-sorted-map/benchmarks/lib.rs | Benchmark utilities (trigram generation, identity hasher). |
| crates/hash-sorted-map/benchmarks/Cargo.toml | Defines the benchmarks crate and its dependencies. |
| crates/hash-sorted-map/README.md | Documents motivation/design and includes benchmark results + running instructions. |
| crates/hash-sorted-map/OPTIMIZATIONS.md | Design/optimization analysis and rationale. |
| crates/hash-sorted-map/Cargo.toml | Declares the new hash-sorted-map crate metadata. |
| Cargo.toml | Adds the benchmarks crate to the workspace members. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comments suppressed due to low confidence (3)
crates/hash-sorted-map/src/hash_sorted_map.rs:560
insert_after_growtreatsInsertion::NeedsOverflowas unreachable after a resize, but it can still happen for adversarial / extremely-colliding hash distributions (the table may still need overflow groups, even immediately after growing). Instead ofunreachable!, handleNeedsOverflowby allocating/linking an overflow group (with the same capacity check as the normal insert path) or by retrying via the normalinsert_hashedlogic.
// After grow, the new primary group for `key` cannot be full (see
// function docs), and the key wasn't in the table before grow.
FindResult::Vacant(Insertion::NeedsOverflow { .. }) | FindResult::Found(_) => {
unreachable!("post-grow walk must hit an empty slot")
}
crates/hash-sorted-map/README.md:90
- The benchmark invocation doesn’t match the benchmark that’s defined in
benchmarks/Cargo.toml([[bench]] name = "performance").cargo bench --bench hashmap_insertwill fail; update the README to the correct command (likelycargo bench -p hash-sorted-map-benchmarks --bench performance, or run from thebenchmarks/crate).
```sh
cargo bench --bench hashmap_insert
**crates/hash-sorted-map/src/hash_sorted_map.rs:367**
* `insert_for_grow` allocates overflow groups without checking `self.num_groups` against `self.groups.len()`. Under heavy hash collisions during rehash, `new_gi` can reach `self.groups.len()` and the subsequent indexing will panic. Consider mirroring the capacity check used in `insert_hashed` (e.g., trigger another `grow()` / allocate more groups) so growth remains panic-free even for pathological hash distributions.
} else {
let new_gi = self.num_groups as usize;
self.groups[gi].overflow = new_gi as u32;
self.num_groups += 1;
group = &mut self.groups[new_gi];
break;
</details>
- **Files reviewed:** 10/10 changed files
- **Comments generated:** 3
jorendorff
left a comment
There was a problem hiding this comment.
This is cool.
Can I see the merge/iteration operations you have in mind, before stamping this?
Co-authored-by: Jason Orendorff <jorendorff@github.com>
|
Here is the very simple sorting of each group: In the ported version, we will need to introduce a new type, since the hash map will be broken after the sorting. Merging is implemented here: We could provide a helper function in the rust crate for this. Feel free to merge/make changes, since I will be OOO tomorrow. |
Co-authored-by: Jason Orendorff <jorendorff@github.com>
Co-authored-by: Jason Orendorff <jorendorff@github.com>
In a subsequent step, merging functions and sorted iterators will be added.