WIP - Preserve duplicate keys for scan resumptions #5957

DomGarguilo · 2025-10-10T19:45:12Z

When a scan contained multiple duplicate keys and the scan broke or cut before returning all of those duplicates, the remainder of those duplicate keys were dropped and not returned when the scan resumed. This happened because when the scan cut, we did not store the MemKey which contains an additional field, kvCount which when used, distinguishes otherwise duplicate keys. When we resumed the scan without these MemKey details, the scanned picked back up after the logical key we left off on, which incorrectly skips the remaining duplicate keys in the scan. We now store the full MemKey and when the scan resumes on the same key, skips only the one entry we already returned so the other duplicate keys are correctly returned now.

The new test case fails without the rest of the changes and passes with them.

keith-turner

There may be two scenarios where this will not work as expected.

First when reading duplicate keys from multiple files they may not always come back in the same order. So if 100 duplicate keys are spread across 3 files and 50 are read then when coming back and skipping 50 those may not be the same 50 values. The keys from multiple files are placed in a priority queue and it will have no guaranteed order for items with the same priority. If files are added or removed between reading scan batches this would probably greatly increase the chance of duplicate keys having a different scan order.

Second this change stores state on the server side. If a scan reads 50 of 100 duplicate keys from tserver A and then tserver A fails, then the scan will skip the other 50 when it resumes on another tservers.

One possible solution to this is to change server side scan batching such that it never ends a batch on a duplicate key or the scan fails if it would do that. So maybe we let the batch grow up to 2x or 3x the batch size (this could be configurable via a new setting) when duplicate keys are present and if it still fills up then the scan fails with an error. This way the scan either gets all the duplicate keys or it fails instead of getting incorrect data sometimes for this case.

Another possible solution is to sort data on the value when everything in the keys is equals. Then scans could continue based on key+value instead of just key. However persisted data is not currently sorted this way. This would be a really big change to code, data, and user creation of rfiles. It offers the advantage of avoiding having to buffer all duplicate keys on the server side..

keith-turner · 2025-11-18T23:10:02Z

Another thing to consider is that not all keys on the server side are MemKeys. Persisted data are not MemKeys. Only recently written data stored in inmemory maps are MemKeys. The purpose of the count on MemKeys is to avoid reading data written after a scan started.

DomGarguilo · 2026-01-15T17:29:38Z

So maybe we let the batch grow up to 2x or 3x the batch size (this could be configurable via a new setting) when duplicate keys are present and if it still fills up then the scan fails with an error. This way the scan either gets all the duplicate keys or it fails instead of getting incorrect data sometimes for this case.

I took this approach in 9a7a523

keith-turner · 2026-01-16T22:08:41Z

server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/TabletBase.java

+          throw new IllegalStateException("Duplicate key run exceeded scan batch timeout for "
+              + cutKey + ". Increase " + Property.TABLE_SCAN_BATCH_DUPLICATE_MAX_MULTIPLIER.getKey()
+              + " or batch timeout, or reduce duplicates for this key.");
+        }


This if stmt used to have a break in it, seems like that will still be needed. When in here we have finished batch and need to break out of the loop.

keith-turner · 2026-01-16T22:24:37Z

server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/TabletBase.java

-        continueKey = new Key(key);
-        skipContinueKey = true;
-        break;
+        if (!cutPending) {


These changes are adding a lot of new work per key/value read. Normally that work will be wasted effort and never needed. Instead of doing that, could instead do a check when closing off batch to see if we are in this situation where the keys are the same. Maybe something like the following, not sure how this would change the rest of the code. But hopefully this strategy could lead to not adding any new work for each key/value.

if (resultSize >= maxResultsSize || results.size() >= scanParams.getMaxEntries() || timesUp) { // closing off a batch, need to see if the last key in the batch and the next unread key are the same if(iter.hasTop()) { iter.next(); if(iter.hasTop() && results.get(results.size()-1).getKey().equals(iter.getTopKey())) { // the last key in the batch and the next key match, so can not terminate batch yet... // not sure what this should do... it could start looping here or set a boolean that // modifies the outer loop behavior... could also read the config here to determine what to // do... in any case do not need to do extra work until now // The best thing do to for efficiency is probably to completely handle the case of duplicates here, // maybe call a function that reads the duplicate related config and loops. This would mean // the normal loop can spend zero computation on this edge case. However, not sure if that // could be done cleanly w/o introducing duplicate code. Depends on if the main loop and this // new loop could share code maybe. } }

DomGarguilo added 2 commits October 7, 2025 14:31

Add test case that proves bug

2fe1a3e

Preserve duplicate keys for scan resumptions

ccbea52

DomGarguilo added this to the 2.1.5 milestone Oct 10, 2025

DomGarguilo self-assigned this Oct 10, 2025

DomGarguilo added blocker This issue blocks any release version labeled on it. bug This issue has been verified to be a bug. labels Oct 10, 2025

DomGarguilo linked an issue Oct 20, 2025 that may be closed by this pull request

Scanning a table without the versioning iterator can drop keys #5951

Open

keith-turner reviewed Nov 18, 2025

View reviewed changes

DomGarguilo added 5 commits November 19, 2025 18:58

Merge remote-tracking branch 'upstream/2.1' into droppedKeysFix

72fcfd9

Merge remote-tracking branch 'upstream/2.1' into droppedKeysFix

b32c036

Merge remote-tracking branch 'upstream/2.1' into droppedKeysFix

f8d426e

Fix hanging bug, keep dup keys across boundaries

fc8b6c9

Revert unrelated change

0f01109

DomGarguilo changed the title ~~Preserve duplicate keys for scan resumptions~~ WIP - Preserve duplicate keys for scan resumptions Dec 10, 2025

DomGarguilo added 6 commits December 10, 2025 13:48

add new test class

dd3b3d3

Merge remote-tracking branch 'upstream/2.1' into droppedKeysFix

c87e52e

Merge remote-tracking branch 'upstream/2.1' into droppedKeysFix

cd44ba1

Merge remote-tracking branch 'upstream/2.1' into droppedKeysFix

09dec0d

Merge remote-tracking branch 'upstream/2.1' into droppedKeysFix

004d94c

Allow scan to grow via new prop

9a7a523

keith-turner reviewed Jan 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WIP - Preserve duplicate keys for scan resumptions #5957

WIP - Preserve duplicate keys for scan resumptions #5957

DomGarguilo commented Oct 10, 2025

Uh oh!

keith-turner left a comment •

edited

Loading

Uh oh!

keith-turner commented Nov 18, 2025

Uh oh!

DomGarguilo commented Jan 15, 2026

Uh oh!

keith-turner Jan 16, 2026

Uh oh!

keith-turner Jan 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

WIP - Preserve duplicate keys for scan resumptions #5957

Are you sure you want to change the base?

WIP - Preserve duplicate keys for scan resumptions #5957

Conversation

DomGarguilo commented Oct 10, 2025

Uh oh!

keith-turner left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

keith-turner commented Nov 18, 2025

Uh oh!

DomGarguilo commented Jan 15, 2026

Uh oh!

keith-turner Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

keith-turner Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

keith-turner left a comment •

edited

Loading

keith-turner Jan 16, 2026 •

edited

Loading