Enable clients to scan offline tables using ScanServers by dlmarion · Pull Request #6156 · apache/accumulo

dlmarion · 2026-02-25T22:35:59Z

During a normal client scan the TabletLocator resolves tablets (key extent and location) for a given search range. The location is necessary for the client to be able to create a connection with a tablet server to perform the scan, but the location is not needed when the client is using scan servers. The TabletLocator does not resolve tablets for offline tables.

This change introduces the OfflineTabletLocatorImpl that performs this resolution (range -> key extents) and does not provide any location information. This change also modifies the client to allow scans on offline tables when using scan servers and uses the new OfflineTabletLocatorImpl in that code path.

During a normal client scan the TabletLocator resolves tablets (key extent and location) for a given search range. The location is necessary for the client to be able to create a connection with a tablet server to perform the scan, but the location is not needed when the client is using scan servers. The TabletLocator does not resolve tablets for offline tables. This change introduces the OfflineTabletLocatorImpl that performs this resolution (range -> key extents) and does not provide any location information. This change also modifies the client to allow scans on offline tables when using scan servers and uses the new OfflineTabletLocatorImpl in that code path.

dlmarion · 2026-02-25T22:37:16Z

This is marked as draft as the more complex text in the new IT class is having an issue with running out of memory. These changes are functional in the smaller scale test, so I likely have a scaling issue and maybe some bugs to work out.

keith-turner · 2026-02-26T15:49:20Z

core/src/main/java/org/apache/accumulo/core/clientImpl/OfflineTabletLocatorImpl.java

+  }
+
+  private final TableId tid;
+  private final TreeSet<KeyExtent> extents = new TreeSet<>();


MAy be able to replace this with LoadPlan.SplitResolver created using SplitResolver.from(SortedSet<Text>). That would work on on rows instead of KeyExtents so would be less objects and probably less overall memory. Would also avoid duplicating rows, the key extents prev row would be stored in another extents end row in memory. So overall would have less objects and less duplication both saving memory.

Could also pass a ImmutableSortedSet from Guava to SplitResolver.from(SortedSet<Text>) instead of a TreeSet. The Guava class is implemented using a sorted array, so it probably uses less memory than a treeset because it would not have all the internal tree node objects.

I reworked the cache in dd9f6c8 such that it's configurable as to the amount of extents it keeps in memory and for how long.

keith-turner · 2026-02-26T15:51:18Z

core/src/main/java/org/apache/accumulo/core/clientImpl/OfflineTabletLocatorImpl.java

+    if (extents.size() > 0) {
+      return;
+    }
+    try (TabletsMetadata tm = context.getAmple().readTablets().forTable(tid)


This is doing a good bit of work. Could time this operation and log how long it took and how many rows it loaded in memory. Then if its being excessive in terms of time, there would be a log.

Added logging for this in 63b9d17

keith-turner · 2026-02-26T15:54:10Z

core/src/main/java/org/apache/accumulo/core/clientImpl/OfflineTabletLocatorImpl.java

+      tm.forEach(t -> {
+        KeyExtent ke = t.getExtent();
+        Location loc = t.getLocation();
+        if (loc != null && loc.getType() != LocationType.LAST) {


Don't think will ever get a last location from getLocation(), think would only see a last location type when calling TabletMetadata.getLast()

Fixed this in 305297e

keith-turner · 2026-02-26T15:55:36Z

core/src/main/java/org/apache/accumulo/core/clientImpl/OfflineTabletLocatorImpl.java

+
+      TabletLocation tl = this.locateTablet(context, startRow, false, false);
+      if (tl == null) {
+        failures.add(r);


If we have coverage of the entire table, should never fail to find anything.

As of dd9f6c8, the cache no longer keeps all of the extents in memory

keith-turner · 2026-02-26T15:57:14Z

core/src/main/java/org/apache/accumulo/core/clientImpl/ScannerImpl.java

+    if (getConsistencyLevel() == ConsistencyLevel.IMMEDIATE) {
+      try {
+        String tableName = context.getTableName(tableId);
+        context.requireNotOffline(tableId, tableName);


If this can do ZK operation per creation of a scanner iterator it could cause problems. Not sure what the impl of this method does.

It's using ZooCache, not hitting ZK directly.

dlmarion · 2026-02-26T19:18:22Z

This is marked as draft as the more complex text in the new IT class is having an issue with running out of memory. These changes are functional in the smaller scale test, so I likely have a scaling issue and maybe some bugs to work out.

Took this out of draft as I have the IT working ( I fixed the known issues) and I reworked the cache to hopefully provide better memory management at larger scales.

keith-turner · 2026-02-26T19:35:49Z

core/src/main/java/org/apache/accumulo/core/clientImpl/OfflineTabletLocatorImpl.java

+      prefetch = Integer
+          .parseInt(ClientProperty.OFFLINE_LOCATOR_CACHE_PREFETCH.getValue(clientProperties));
+      cache = Caffeine.newBuilder().expireAfterAccess(cacheDuration).initialCapacity(maxCacheSize)
+          .maximumSize(maxCacheSize).evictionListener(this).removalListener(this)


Why have an eviction and removal listener?

to mark it for removal from the TreeSet.

Seems like everything removed from the cache will be passed to the removalListner. So it seems redundant to also set the evictionListener. Also the evictionListner seems to run in a lock and the removal listener does not, does not seem like the locking is needed here.

Implemented this change in 3ed4585

keith-turner · 2026-02-26T19:38:47Z

core/src/main/java/org/apache/accumulo/core/clientImpl/OfflineTabletLocatorImpl.java

+      try {
+        KeyExtent match = extents.ceiling(start);
+        if (match != null && match.contains(start)) {
+          LOG.trace("Extent {} found in cache for start row {}", match, start);


could access the extent in the caffeine cache to update its access time.

Implemented this change in 3ed4585

keith-turner · 2026-02-26T19:49:14Z

core/src/main/java/org/apache/accumulo/core/clientImpl/OfflineTabletLocatorImpl.java

+      }
+    }
+
+    private KeyExtent findOrLoadExtent(KeyExtent start) {


Could simplify the locking by making extents use a ConcurrentSkip list. Then would not need the read lock. Could have a single lock only for the case of doing updates, the code that currently gets a write lock. Also the eviction handler could directly remove from the extents map if it were a concurrently skip list w/o any locking.

I attempted this locally and it doesn't work the way that we want it to. When the cache becomes full Caffeine may start evicting newly inserted entries which ends up causing the IT to fail.

keith-turner · 2026-02-26T19:52:06Z

core/src/main/java/org/apache/accumulo/core/clientImpl/OfflineTabletLocatorImpl.java

+          }
+          LOG.trace("Caching extent: {}", ke);
+          cache.put(ke, ke);
+          extents.add(ke);


Would be safest to remove any overlapping extents when adding new extents. If the table is offline for the entire duration of this caches lifetime that should not happen. However this cache could be long lived and the table could be brought online and taken offline during its lifetime, and it may not see that event.

Implemented this change in 3ed4585

keith-turner · 2026-02-26T19:54:53Z

core/src/main/java/org/apache/accumulo/core/clientImpl/OfflineTabletLocatorImpl.java

+          Integer.parseInt(ClientProperty.OFFLINE_LOCATOR_CACHE_SIZE.getValue(clientProperties));
+      prefetch = Integer
+          .parseInt(ClientProperty.OFFLINE_LOCATOR_CACHE_PREFETCH.getValue(clientProperties));
+      cache = Caffeine.newBuilder().expireAfterAccess(cacheDuration).initialCapacity(maxCacheSize)


The cache could have a weigher. The could be useful for the case where tablets splits can widely vary in size.

keith-turner · 2026-02-27T22:11:17Z

core/src/main/java/org/apache/accumulo/core/clientImpl/OfflineTabletLocatorImpl.java

+      lock.readLock().lock();
+      try {
+        KeyExtent match = extents.ceiling(searchKey);
+        if (match != null && match.contains(searchKey)) {


Not sure this call to contains is working correctly. The way search key is created it goes from -inf to row, so the match may not contain it. Wrote the following test program and it prints false true.

KeyExtent lookupExtent = KeyExtent.fromMetaRow(new Text("1;cat")); KeyExtent match = new KeyExtent(TableId.of("1"), new Text("d"),new Text("b") ); System.out.println(match.contains(lookupExtent)); System.out.println(match.contains(new Text("cat")));

dlmarion self-assigned this Feb 25, 2026

dlmarion added this to the 2.1.5 milestone Feb 25, 2026

keith-turner reviewed Feb 26, 2026

View reviewed changes

Made Offline tablet cache configurable, fix IT

dd9f6c8

dlmarion marked this pull request as ready for review February 26, 2026 19:17

keith-turner reviewed Feb 26, 2026

View reviewed changes

dlmarion added 5 commits February 27, 2026 18:02

Implemented some PR suggestions

3ed4585

Fix build

8f3ad2d

Enable cache metrics, fix location check

305297e

Add logging for time it takes to load tablets

63b9d17

Fix formatting

5603230

keith-turner reviewed Feb 27, 2026

View reviewed changes

Conversation

dlmarion commented Feb 25, 2026

Uh oh!

dlmarion commented Feb 25, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dlmarion commented Feb 26, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants