HBASE-29889 Add XXH3 Hash Support to Bloom Filter #7740

jinhyukify · 2026-02-11T18:14:57Z

Jira https://issues.apache.org/jira/browse/HBASE-29889

This PR adds XXH3 as a new Bloom filter hash type. XXH3 is designed for modern CPU architectures and shows clearly better performance than the existing Jenkins/Murmur/Murmur3 hashes used today.

Benchmark results and brief implementation notes can be found here:
Benchmark and Design Notes (Google Doc)

Benchmark test code here: jinhyukify/xxh3-benchmark

… hashing

jinhyukify · 2026-02-11T18:17:57Z

hbase-common/src/main/java/org/apache/hadoop/hbase/util/XXH3.java

+    (byte) 0x8f, (byte) 0x95, (byte) 0x16, (byte) 0x04, (byte) 0x28, (byte) 0xaf, (byte) 0xd7,
+    (byte) 0xfb, (byte) 0xca, (byte) 0xbb, (byte) 0x4b, (byte) 0x40, (byte) 0x7e, };
+
+  // Pre-converted longs from DefaultSecret to avoid reconstruction at runtime


This matches the little-endian value we get from reading the default secret bytes as-is.
Pre-computing it as a long like this gave a small performance bump when I tested.

This loads an additional set of 37 long values as static fields.
There’s some overhead from keeping these statically initialized values around, but the performance gains make it worthwhile.

jinhyukify · 2026-02-11T18:24:47Z

hbase-common/src/main/java/org/apache/hadoop/hbase/util/RowColBloomHashKey.java

+    // Optimization: when the offset points to the last 8 bytes,
+    // we can return the precomputed trailing long value directly.
+    if (offset + Bytes.SIZEOF_LONG == totalLength) {
+      return LAST_8_BYTES;


This approach gave better performance in the 9–16 byte hash path.

jinhyukify · 2026-02-11T18:26:38Z

hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/CompoundBloomFilter.java

+    sb.append(BloomFilterUtil.STATS_RECORD_SEP + "Hash type: " + hashType)
+      .append(" (" + Optional.ofNullable(Hash.getInstance(hashType))
+        .map(i -> i.getClass().getSimpleName()).orElse("UNKNOWN") + ")");


I'd like to show the Bloom filter hash type in HFilePrettyPrinter.

jinhyukify · 2026-02-11T18:30:48Z

hbase-server/src/main/java/org/apache/hadoop/hbase/util/BloomFilterUtil.java

+   * @param key  the hash key
+   * @return a pair of hash values (hash1, hash2)
+   */
+  public static Pair<Integer, Integer> getHashPair(Hash hash, HashKey<?> key) {


This part gave me a bit of trouble.
I put quite a lot of work into making the XXH3 implementation zero-heap, but ironically ended up adding a small object allocation in the hash calculation path.

The main reason was that the Bloom filter hash-location logic was split across two different classes, so I consolidated it into one place. This contains path is pretty hot, so I hesitated a bit, but given recent GC algorithm performance it might be acceptable.

Still, I'd love to hear your thoughts on this trade-off.

jinhyukify · 2026-02-11T18:32:18Z

hbase-server/src/main/java/org/apache/hadoop/hbase/util/BloomFilterUtil.java

+      int hash1 = (int) hash64;
+      int hash2 = (int) (hash64 >>> 32);


A well-designed hash function should behave reliably even if we take either half of its 64-bit output as a 32-bit value. XXH3 is no exception.
I'm adding the XXH3 author’s comment here for reference.
Cyan4973/xxHash#453 (comment)

jinhyukify · 2026-02-11T18:37:57Z

hbase-common/src/main/java/org/apache/hadoop/hbase/util/Hash64.java

+   * @param seed    the 64-bit seed value
+   * @return the computed 64-bit hash value
+   */
+  <T> long hash64(HashKey<T> hashKey, long seed);


The goal here is to take a single 64-bit hash result and split it into two 32-bit hashes to compute the Bloom hash locations.

-------------- 64-bit hash output -------------- | 64 bits | ------------------------------------------------ | lower 32 bits (hash1) | | upper 32 bits (hash2) | ------------------------------------------------

Since XXH3 already performs much better than the existing hashes and we no longer need to run the hash function twice, this approach gives us an additional performance win on top of the baseline speedup.

jinhyukify · 2026-02-12T01:48:31Z

hbase-common/src/main/java/org/apache/hadoop/hbase/util/XXH3.java

+ */
+@InterfaceAudience.Private
+@InterfaceStability.Unstable
+public class XXH3 extends Hash implements Hash64 {


I mostly followed the algorithm as described here
xxh3-algorithm-overview

Also referenced the original implementation.
https://xxhash.com/doc/v0.8.2/xxhash_8h_source.html

jinhyukify added 2 commits February 12, 2026 03:06

HBASE-29889 Add LittleEndianBytes utility for fast LE primitive access

31733b1

HBASE-29889 Extend HashKey with bulk little-endian accessors for fast…

e6a302e

… hashing

jinhyukify force-pushed the HBASE-29889 branch from 3453eaa to b7411bd Compare February 11, 2026 18:24

jinhyukify added 3 commits February 12, 2026 03:41

HBASE-29889 Implement XXH3 64bit hashing

fbdc25c

HBASE-29889 Add 64bit Bloom filter hash support

9281600

HBASE-29889 Add XXH3 to Bloom filter hashing

db169a0

jinhyukify force-pushed the HBASE-29889 branch from b7411bd to db169a0 Compare February 11, 2026 18:41

jinhyukify commented Feb 11, 2026

View reviewed changes

jinhyukify commented Feb 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HBASE-29889 Add XXH3 Hash Support to Bloom Filter #7740

HBASE-29889 Add XXH3 Hash Support to Bloom Filter #7740

jinhyukify commented Feb 11, 2026 •

edited

Loading

Uh oh!

jinhyukify Feb 11, 2026

Uh oh!

jinhyukify Feb 11, 2026

Uh oh!

jinhyukify Feb 11, 2026

Uh oh!

jinhyukify Feb 11, 2026

Uh oh!

jinhyukify Feb 11, 2026

Uh oh!

jinhyukify Feb 11, 2026

Uh oh!

jinhyukify Feb 11, 2026

Uh oh!

jinhyukify Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

HBASE-29889 Add XXH3 Hash Support to Bloom Filter #7740

Are you sure you want to change the base?

HBASE-29889 Add XXH3 Hash Support to Bloom Filter #7740

Conversation

jinhyukify commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jinhyukify commented Feb 11, 2026 •

edited

Loading