Skip to content

Support finding insertion positions from &HashTable#725

Open
eggyal wants to merge 1 commit into
rust-lang:mainfrom
eggyal:hashtable_insert_at_position
Open

Support finding insertion positions from &HashTable#725
eggyal wants to merge 1 commit into
rust-lang:mainfrom
eggyal:hashtable_insert_at_position

Conversation

@eggyal
Copy link
Copy Markdown

@eggyal eggyal commented Apr 29, 2026

Adds to HashTable the following APIs:

fn find_or_find_vacant_position(&self, hash, eq)
  -> Result<&T, Option<VacantPosition>>

unsafe fn insert_at_position(&mut self, position, value) -> &mut T

This enables an insertion position to be located through an immutable &HashTable reference (such as through a read-lock of a RwLock), in order that it might subsequently be used to perform an insertion when an exclusive &mut HashTable reference becomes available (such as through a write-lock of the same RwLock). In so doing, users can avoid repeated hash probing when it is known that the table has not been mutated in the interim.

Closes #723 cc @cuviper

Comment thread src/table.rs Outdated
Comment on lines +273 to +274
pub unsafe fn insert_at_position(&mut self, position: VacantPosition, value: T) -> &mut T {
debug_assert!(self.len() < self.capacity());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the sanity check we need, rather that index < num_buckets && !is_bucket_full(index).

We could also split function this into a safe version, perhaps -> Result<&mut T, T> giving the value back on error, and an unsafe unchecked version without any assertion. That safe version could have logic errors if the table was mutated in some way while keeping that index apparently valid -- but it would still be memory safe.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the safe version also needs some check to make sure we don't exceed the load factor, but IIRC it's possible to do that even with no apparent capacity left if the index happens to be a DELETED tombstone.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That safe version could have logic errors if the table was mutated in some way while keeping that index apparently valid -- but it would still be memory safe.

Is this correct? What would such a safe function call to perform the insertion? Currently both RawTable::insert_at_index and RawTable::insert_tagged_at_index stipulate that it's unsound to call them with an index that wasn't previously returned by find_or_find_insert_index or if any mutation of the table has occurred since.

Looking at their implementations, they ultimately make four unsafe calls: I think that three of those (RawTableInner::ctrl, RawTable::bucket and Bucket::write) might actually only be unsound if the index is out of range; the fourth unsafe call, to RawTableInner::record_item_insert_at, doesn't have any safety documentation—and I'm not sufficiently familiar with how the metadata is maintained to be certain there's no safety concern here. For example, it increments self.items; if there's an existing value in the bucket that's being overwritten, then surely that will result in an incorrect count?

Perhaps there's no soundness issues, and it's okay to relax the safety requirements of RawTable::insert_at_index and RawTable::insert_tagged_at_index to merely require that the index be in range (stipulating instead that passing incorrect indices may result in logic errors and/or panics)? They are only crate-visible, after all. Then it would indeed be pretty trivial to publicly expose a safe/checked version of this new HashTable::insert_at_position method.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, it increments self.items; if there's an existing value in the bucket that's being overwritten, then surely that will result in an incorrect count?

Checking is_bucket_full prevents that case.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, I'd be interested if others have opinions before you change anything toward my suggestion.

@eggyal eggyal force-pushed the hashtable_insert_at_position branch 2 times, most recently from 3edcfbf to 6bf6e57 Compare April 29, 2026 22:19
Adds to `HashTable` the following APIs:

```rust
fn find_or_find_vacant_position(&self, hash, eq)
  -> Result<&T, Option<VacantPosition>>

unsafe fn insert_at_position(&mut self, position, value) -> &mut T
```

This enables an insertion position to be located through an immutable
`&HashTable` reference (such as through a read-lock of a `RwLock`), in
order that it might subsequently be used to perform an insertion when an
exclusive `&mut HashTable` reference becomes available (such as through
a write-lock of the same `RwLock`).  In so doing, users can avoid
repeated hash probing when it is known that the table has not been
mutated in the interim.
@eggyal eggyal force-pushed the hashtable_insert_at_position branch from 6bf6e57 to bc09d05 Compare April 30, 2026 05:18
@clarfonthey
Copy link
Copy Markdown
Contributor

I'm not necessarily against this API, but I did recently factor out the InsertIndex API and do wonder if we really want to be recreating a similar one. Since not mutating the table between reading is a pretty big invariant to uphold, I do wonder if it's worth wrapping the hash + index in an opaque struct instead of just passing the index directly and letting the user pass in the hash again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Extend HashTable's bucket API to support insertion ?

3 participants