-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Description
Per discussion here #125939 (comment) the FrozenDictionary/FrozenSet have a couple points to investigate
- Perf measurements at various sizes and key types found no measurable difference in lookup speeds with and without the size tuning loop. This is the code that tries different primes to reduce the collision rate.
Result: zero measurable difference for FrozenSet, FrozenDictionary<int,int>, and FrozenDictionary<string,int> at sizes 1K–500K. This is expected: the tuning loop tries to push collision rate below 5%, but the birthday problem makes this mathematically impossible at medium+ sizes. With MaxLargeBucketTableMultiplier=3, the best achievable collision rate at 100K entries is ~14.5% (N²/2M with M=3N). The loop exhausts its search range and falls back to the starting prime anyway.
- Related and same area of code. investigate intermediate load factors. We know 1.0 is worse than the 0.5 it chooses (at least up until the 3.5M hashcode threshold where it drops to 1.0)
We tested LF ~0.5 and LF ~1.0 but not values in between. An intermediate LF like ~0.75 (~1.33N buckets) would save ~33% memory vs 0.5 while only 53% of buckets have chains (vs 39% at 0.5). The regression might be small enough to accept for the memory savings.
No urgency to investigate these, it works perfectly well. Any investigation will have to benchmark different strategies thoroughly to give confidence in whether to make any change. That would include key types, collision rates, sizes, load factors, with and without the tuning. (Being frozen, at least there's only two metrics that matter: construction time to a small extent, and lookup time. No mutation to benchmark.)