feat(import): add support for multiple hbase snapshot imports by tianlei2 · Pull Request #4600 · googleapis/java-bigtable-hbase

tianlei2 · 2026-04-28T18:41:41Z

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
Ensure the tests and linter pass
Code coverage does not decrease (if any source code was changed)
Appropriate docs were updated (if necessary)

b/429250716

This is the first PR that incorporates changes from https://github.com/jhambleton/java-bigtable-hbase/commits/dataflow-v2-v2.15.6 and some fixes to make it pass the tests.

Fixed Test Isolation Issues
SnapshotUtilsTest.testGetHbaseConfiguration was failing because the static configuration field SnapshotUtils.hbaseConfiguration cached state between test cases, leaking stale data into subsequent tests.
- Solution: Added a @before setup method to reset the static field to null via reflection before every test run.
Fixed Timestamp Formatting Tests
SnapshotUtilsTest.testAppendCurrentTimestamp was throwing a NumberFormatException because the return value contained a UUID suffix (timestamp-UUID), but the test attempted to parse the entire string directly as a Long.
- Solution: Updated the test to split the string using the "-" character to extract and correctly parse just the timestamp prefix.
Resolved Classpath and SPI Conflicts (dnsjava)
Integration tests failed on Java 8 and 11 in Kokoro because of unshaded transitive dependency conflicts (com.google.protobuf.LiteralByteString NoClassDefFoundError).
- Solution: Reverted back to the shaded hbase-shaded-mapreduce dependency, ensuring proper compatibility across all Java versions.
Uncommented and Fixed Tests in ImportJobFromHbaseSnapshotTest
Several useful unit tests were commented out in ImportJobFromHbaseSnapshotTest because mockito-core lacked the ability to mock static methods.
- Solution:
  Switched from mockito-core to mockito-inline in the pom.xml to allow static mocking.
  Uncommented the code and restored the original formatting to prevent any lint errors, enabling JUnit to verify correct configuration parsing.
ComputeAndValidateHashFromBigtableDoFnTest.java was accidentally deleted, adding back
Cleanups on unused comments

tianlei2 · 2026-04-29T17:39:52Z

Integration test is run and passing:
https://fusion2.corp.google.com/invocations/33f68247-794b-45c2-8294-1b97ff42c5d0/artifacts/github%2Fjava-bigtable-hbase%2Fbigtable-dataflow-parent%2Fbigtable-beam-import%2Ftarget%2Ffailsafe-reports%2Fintegration-beam%2Fcom.google.cloud.bigtable.beam.hbasesnapshots.EndToEndIT;config=default

…ndToEndIT

…sjava SPI conflict

…bFromHbaseSnapshotTest

…lsTest

vermas2012

Thanks for including tests for the config and utility classes! However, before we can merge this, we need unit test coverage for the core pipeline transformations—specifically ReadRegions, ListRegions, and HbaseRegionSplitTracker. Because ReadRegions handles complex dynamic splitting and large-cell filtering, we need to ensure that logic is verified. We should also add a standard coder test for RegionConfigCoder to prevent serialization bugs.

vermas2012 · 2026-05-06T19:18:49Z

is this needed?

the package is not empty since it contains RegionConfigCoder. Maybe we can remove it if we move the RegionConfigCoder somewhere else?

vermas2012 · 2026-05-06T19:51:25Z

+    try {
+      cleanupSnapshot(snapshotConfig);
+    } catch (Exception ex) {
+      LOG.error(


What is the implication of swallowing an exception here? is it possible to retry the cleaup and make sure we don't leave restored snaphsot in GCS?

vermas2012 · 2026-05-06T20:18:02Z

This file seems too big having 3-4 objects each doing a specific task, see if we can split this into 3-4 files. Each of then with their own unit tests. I will leave another comment for testing in the PR.

vermas2012 · 2026-05-06T20:23:59Z

+    pipeline
+        .apply("CreateInput", Create.of(SnapshotTestHelper.newSnapshotConfig("invalid_path")))
+        .apply("DeleteSnapshot", ParDo.of(new CleanupRestoredSnapshots()));
+    pipeline.run();


what are we asserting here?

added a comment here. the pipeline should run successfully without throwing an exception (for CleanupRestoredSnapshots)

vermas2012 · 2026-05-06T20:28:58Z

            .withTableId(opts.getBigtableTableId())
-            .withConfiguration(BigtableOptionsFactory.CUSTOM_USER_AGENT_KEY, customUserAgent);
+            .withConfiguration(BigtableOptionsFactory.CUSTOM_USER_AGENT_KEY, customUserAgent)
+            .withConfiguration(BigtableOptionsFactory.MAX_INFLIGHT_RPCS_KEY, "100")


This 100 and 30 ms later is hardcoded with no way to change it, are we sure this work for all imports, otherwise, we should expose them as options that we can set from the job config.

vermas2012 · 2026-05-06T20:37:58Z

+  }
+
+  public static Configuration getHBaseConfiguration(Map<String, String> configurations) {
+    if (hbaseConfiguration == null) {


This double checked locking requires the hbaseConfiguration object to be volatile to work

Note on the volatile keyword:
Adding volatile here is strictly required for thread safety. Without it, the Java compiler is allowed to reorder the object initialization and the reference assignment. This means another thread could bypass the null check and receive a partially constructed object.

This was a well-documented flaw in early Java that was fixed in JSR-133 by enforcing a strict happens-before guarantee on volatile reads/writes.

Reference: Baeldung: Double-Checked Locking with Singleton

Official Sonar Rule: RSPEC-2168

vermas2012 · 2026-05-06T20:39:35Z

+    this.enableDynamicSplitting = enableDynamicSplitting;
+  }
+
+  public ByteKeyRange currentRestriction() {


missing @OverRide, same for other methods like tryClaim

vermas2012 · 2026-05-06T20:42:41Z

what is the size of snapshot being imported? How many shards are we setting? i don't see teh shard configs, so maybe we are not setting the shards, we need to set them, to make sure we are importing the whole snaphshot via the multiple jobs.

add a sharded test. snapshot file size is 78kb

…stRegions

tianlei2 requested a review from a team as a code owner April 28, 2026 18:41

product-auto-label Bot added size: xl Pull request size is extra large. api: bigtable Issues related to the googleapis/java-bigtable-hbase API. labels Apr 28, 2026

tianlei2 force-pushed the dataflow-import branch 3 times, most recently from f5324bd to f8b5932 Compare April 28, 2026 19:22

tianlei2 changed the title ~~Dataflow import~~ feat(import): add support for multiple hbase snapshot imports Apr 28, 2026

tianlei2 force-pushed the dataflow-import branch 3 times, most recently from 13f65dc to a3a6c7f Compare April 28, 2026 20:17

tianlei2 marked this pull request as draft April 28, 2026 20:42

tianlei2 marked this pull request as ready for review April 28, 2026 22:41

tianlei2 marked this pull request as draft April 28, 2026 22:42

tianlei2 added the kokoro:run Add this label to force Kokoro to re-run the tests. label Apr 29, 2026

tianlei2 self-assigned this Apr 29, 2026

yoshi-kokoro removed the kokoro:run Add this label to force Kokoro to re-run the tests. label Apr 29, 2026

tianlei2 added 4 commits April 29, 2026 18:18

feat(import): add support for multiple hbase snapshot imports

26e20e4

fix(import): add fork to compiler plugin and optimize GCS search in E…

58cb66e

…ndToEndIT

fix(import): format non-complying files for Google Java Style compliance

66659a8

fix(import): exclude hbase-shaded-client from mapreduce to prevent dn…

a1f9b04

…sjava SPI conflict

tianlei2 force-pushed the dataflow-import branch 4 times, most recently from 12f1c11 to d511a61 Compare April 29, 2026 19:46

tianlei2 added 2 commits April 29, 2026 19:55

test(import): switch to mockito-inline and fix unit tests in ImportJo…

49c3e28

…bFromHbaseSnapshotTest

test(import): fix test isolation and timestamp parsing in SnapshotUti…

5ec8dc1

…lsTest

tianlei2 force-pushed the dataflow-import branch from d511a61 to 5ec8dc1 Compare April 29, 2026 19:57

tianlei2 requested a review from vermas2012 April 29, 2026 20:27

googleapis deleted a comment from google-cla Bot Apr 29, 2026

tianlei2 mentioned this pull request Apr 30, 2026

feat(import): add script tool for multiple hbase snapshot imports) #4606

Open

4 tasks

tianlei2 force-pushed the dataflow-import branch 5 times, most recently from 4299c64 to 86d5318 Compare May 6, 2026 16:23

comment and file cleanup

f1125d8

tianlei2 force-pushed the dataflow-import branch from 86d5318 to f1125d8 Compare May 6, 2026 16:46

tianlei2 added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label May 6, 2026

yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label May 6, 2026

tianlei2 added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label May 6, 2026

yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label May 6, 2026

vermas2012 reviewed May 6, 2026

View reviewed changes

tianlei2 force-pushed the dataflow-import branch 6 times, most recently from 738ee25 to 682753c Compare May 7, 2026 15:26

Add sharding test for HBase snapshot import

e85775c

tianlei2 force-pushed the dataflow-import branch 2 times, most recently from 4bd17e1 to b49307c Compare May 7, 2026 16:04

Apply style fixes and resilience updates for HBase snapshot import

d595262

tianlei2 force-pushed the dataflow-import branch 2 times, most recently from f84da91 to caace0b Compare May 7, 2026 16:58

Expose hardcoded options in TemplateUtils

4c5446e

tianlei2 force-pushed the dataflow-import branch from 4e91349 to 5134e4b Compare May 7, 2026 17:30

Add unit tests for RegionConfigCoder, HbaseRegionSplitTracker, and Li…

808dbac

…stRegions

tianlei2 force-pushed the dataflow-import branch from 5134e4b to 808dbac Compare May 7, 2026 18:44

Refactor ReadRegions.java into multiple files and add unit tests

526f425

Conversation

tianlei2 commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tianlei2 commented Apr 29, 2026

Uh oh!

vermas2012 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tianlei2 commented Apr 28, 2026 •

edited

Loading