Skip to content

HBASE-30161 Add paginated, single-RPC RegionLocator.getRegionLocations(startKey, limit) API for bulk meta-cache warmup#8236

Open
sanjeet006py wants to merge 2 commits into
apache:branch-2from
sanjeet006py:HBASE-30161
Open

HBASE-30161 Add paginated, single-RPC RegionLocator.getRegionLocations(startKey, limit) API for bulk meta-cache warmup#8236
sanjeet006py wants to merge 2 commits into
apache:branch-2from
sanjeet006py:HBASE-30161

Conversation

@sanjeet006py
Copy link
Copy Markdown
Contributor

JIRA: HBASE-30161

Generated-by: Claude Opus 4.7

Sanjeet Malhotra added 2 commits May 14, 2026 13:49
public List<HRegionLocation> getRegionLocations(byte[] startKey, int limit) throws IOException {
// No need to page as region locations are already in-memory.
return getAllRegionLocations();
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually breaking the contract established by the javadoc by always returning all regions.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was also wondering same. Thanks for bringing this up. Instead of implementing paging here, shall I throw an exception saying use getAllRegionLocations as I don't see advantage of paging in here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that works too, as long as you call out the fact that all implementation may not support it and that they should fallback to getAllRegionLocations.

* more regions exist
* @throws IOException if a remote or network exception occurs
*/
List<HRegionLocation> getRegionLocations(byte[] startKey, int limit) throws IOException;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is changing a public interface on a stable branch, this will break external code implementing the interface, why not have a default implementation perhaps throw unsupported or return a blank list?

* Ordering: regions are returned in ascending region start-key order (the natural order of
* {@code hbase:meta} rows for a single table). Within each region, replicas are returned in
* ascending replica-id order (replica 0, then 1, then 2, ...). Split parents and offline regions
* are filtered out, which may cause a page to contain fewer than {@code limit} regions but never
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this filtering happening? I didn't see any test coverage either. The existing methods don't do any such filtering correct, so is this even needed?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filtering is already implemented and happens in inside MetaTableAccessor.DefaultVisitorBase#visit(). Call chain: HRegionLocator#getRegionLocations() -> MetaTableAccessor.TableVisitorBase#visit() -> MetaTableAccessor.DefaultVisitorBase#visit().

Copy link
Copy Markdown
Contributor Author

@sanjeet006py sanjeet006py May 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am reusing the filtering logic so, no test coverage needed.

* more regions exist
* @throws IOException if a remote or network exception occurs
*/
List<HRegionLocation> getRegionLocations(byte[] startKey, int limit) throws IOException;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this is overloading an existing methods that takes a data row to find its locations, which is a totally different semantic and so reusing that name for this can be very confusing. Perhaps just call this getLocations or may be you can come up with a better name.

sanjeet006py pushed a commit to sanjeet006py/hbase that referenced this pull request May 14, 2026
… and add default-throws

Review comments on apache#8236 by haridsv:
- Adding an abstract method to RegionLocator / AsyncTableRegionLocator breaks
  external implementers. Convert the new method to a default that throws
  UnsupportedOperationException, with javadoc instructing callers to fall
  back to getAllRegionLocations().
- Reusing the getRegionLocations name overloads a method with completely
  different semantic (row -> all replicas of containing region). Rename to
  getRegionLocationsPage(byte[] startKey, int limit) to make the
  pagination/range intent explicit.
- SnapshotRegionLocator no longer needs an override; it inherits the
  default-throws and callers fall back to getAllRegionLocations().

Tests renamed to call getRegionLocationsPage; semantic and assertions
unchanged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants