Skip to content

fix(iotv2): order delete materialization and tsfile snapshot#17335

Open
Pengzna wants to merge 9 commits intoapache:masterfrom
Pengzna:codex/iotv2-delete-mod-fix
Open

fix(iotv2): order delete materialization and tsfile snapshot#17335
Pengzna wants to merge 9 commits intoapache:masterfrom
Pengzna:codex/iotv2-delete-mod-fix

Conversation

@Pengzna
Copy link
Collaborator

@Pengzna Pengzna commented Mar 22, 2026

Summary

This PR fixes the IoTConsensusV2 replica inconsistency observed when realtime TsFile replication races with local CLI DELETE.

The actual bug is not simply that delete and TsFile run concurrently. The missing ordering is between:

  • when a local delete enters the sealed-file materialization path and determines which sealed TsFiles it will affect
  • when a realtime PipeTsFileInsertionEvent freezes the TsFile/mod snapshot that will be transferred

Without that ordering, a TsFile snapshot may:

  • miss deletes that entered earlier, or
  • absorb deletes that entered later

Root Cause

On the leader, sealed-file delete materialization and realtime TsFile snapshot freezing were not serialized against each other.

That left a window where:

  • a delete had already entered the local sealed-file path, but had not finished resolving/materializing .mods
  • the realtime TsFile event still froze its transfer snapshot

Followers still apply events in replicateIndex order, but they can only be correct if the leader-side TsFile snapshot already reflects the right delete visibility.

Changes

1. Ordered barrier for delete vs TsFile snapshot

Add a region-ordered, path-aware barrier in PipeTsFileDeletionBarrier:

  • beginDeletion(regionId) allocates a monotonic deleteSeq
  • resolveDeletionTargets(regionId, deleteSeq, tsFilePaths) registers the final sealed TsFile targets
  • beginSnapshot(regionId, tsFilePath) captures snapshotUpperBound
  • awaitDeletionResolutionUpTo(...) and awaitPendingDeletionsUpTo(...) ensure the snapshot sees all earlier deletes
  • awaitSnapshotsBeforeMaterialization(...) blocks later deletes from materializing into a TsFile whose earlier snapshot is still freezing

This guarantees:

  • deletes that entered before assignReplicateIndex are included in the TsFile snapshot
  • deletes that entered after assignReplicateIndex are not mixed into that snapshot

2. Delay mod snapshot for IoTV2 realtime TsFile copies

For IoTConsensusV2 realtime TsFile events:

  • the assigner-side first retain now pins only the TsFile
  • per-pipe copies start the snapshot barrier only after setReplicateIndexForIoTV2(...)
  • the per-pipe first retain waits for earlier deletes, then freezes the mod snapshot under resource.readLock() + exclusiveModFile.writeLock()

3. Separate live mod visibility from pinned mod snapshot

PipeTsFileInsertionEvent now cleanly separates:

  • live mod visibility on TsFileResource
  • the already pinned/frozen mod snapshot

This prevents:

  • overwriting a pinned mod path during later refreshes
  • decreasing the wrong mod file reference on release
  • inheriting stale or future mod state through shallowCopy

4. Unify DataRegion delete paths

deleteByDevice, deleteByTable, and deleteDataDirectly now share the same local helper flow:

  • allocate deleteSeq
  • handle unsealed files
  • resolve final sealed targets
  • wait for earlier snapshots before sealed-file materialization
  • finish the barrier in one place

5. Regression tests

Added/updated tests to cover:

  • snapshot waits for earlier delete target resolution
  • snapshot waits for pending earlier deletions on the same TsFile
  • later delete waits for earlier snapshot freeze
  • IoTV2 assigner-side first retain pins only TsFile
  • per-pipe copy freezes earlier deletes but not later deletes
  • pinned mod paths stay stable after refresh/release

Verification

Executed successfully:

export JAVA_HOME=$(/usr/libexec/java_home -v 17)
export PATH="$JAVA_HOME/bin:$PATH"
mvn -pl iotdb-core/datanode -DskipTests spotless:apply
export JAVA_HOME=$(/usr/libexec/java_home -v 17)
export PATH="$JAVA_HOME/bin:$PATH"
mvn -pl iotdb-core/datanode -am -DskipTests -Dspotless.skip=true test-compile -rf :iotdb-server
CLASSPATH="$(cat /tmp/iotdb-datanode-test-classpath.txt):iotdb-core/datanode/target/classes:iotdb-core/datanode/target/test-classes:iotdb-core/node-commons/target/classes"

javac --release 8 -cp "$CLASSPATH" -d iotdb-core/datanode/target/classes \
  iotdb-core/datanode/src/main/java/org/apache/iotdb/db/pipe/consensus/deletion/PipeTsFileDeletionBarrier.java \
  iotdb-core/datanode/src/main/java/org/apache/iotdb/db/pipe/event/common/tsfile/PipeTsFileInsertionEvent.java \
  iotdb-core/datanode/src/main/java/org/apache/iotdb/db/pipe/source/dataregion/realtime/assigner/PipeDataRegionAssigner.java \
  iotdb-core/datanode/src/main/java/org/apache/iotdb/db/storageengine/dataregion/DataRegion.java

javac --release 8 -cp "$CLASSPATH" -d iotdb-core/datanode/target/test-classes \
  iotdb-core/datanode/src/test/java/org/apache/iotdb/db/pipe/consensus/deletion/PipeTsFileDeletionBarrierTest.java \
  iotdb-core/datanode/src/test/java/org/apache/iotdb/db/pipe/event/PipeTsFileInsertionEventTest.java

java -cp "$CLASSPATH" org.junit.runner.JUnitCore \
  org.apache.iotdb.db.pipe.consensus.deletion.PipeTsFileDeletionBarrierTest \
  org.apache.iotdb.db.pipe.event.PipeTsFileInsertionEventTest

Result: OK (11 tests) and BUILD SUCCESS.

Generated by @codex

Copilot AI review requested due to automatic review settings March 22, 2026 14:30
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes an IoTConsensusV2 tsfile replication inconsistency where a .mods file can be created after a PipeTsFileInsertionEvent is constructed, causing late-created deletions to be missed during transfer.

Changes:

  • Make PipeTsFileInsertionEvent re-evaluate mod-file existence dynamically and preserve the “should transfer mod” intent across shallow copies.
  • Update IoTConsensusV2 / thrift sink handlers to decide “transfer mod” based on a single getModFile() snapshot.
  • Add a regression test for late-created mod visibility after shallow copy.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/pipe/event/common/tsfile/PipeTsFileInsertionEvent.java Introduces dynamic mod-file refresh logic and preserves “should transfer mod” across shallow copies.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/pipe/sink/protocol/thrift/async/IoTDBDataRegionAsyncSink.java Uses a local modFile snapshot to configure tsfile transfer handler.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/pipe/sink/protocol/iotconsensusv2/handler/IoTConsensusV2TsFileInsertionEventHandler.java Aligns transfer decision to modFile != null and simplifies reader selection.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/pipe/sink/protocol/iotconsensusv2/IoTConsensusV2SyncSink.java Switches to modFile != null for mod transfer decision.
iotdb-core/datanode/src/test/java/org/apache/iotdb/db/pipe/event/PipeTsFileInsertionEventTest.java Adds regression test for late-created mod observation after shallow copy.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +872 to +876
return;
}

isWithMod = resource.anyModFileExists();
modFile = isWithMod ? resource.getExclusiveModFile().getFile() : null;
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refreshModFileState() always resets modFile from resource.getExclusiveModFile().getFile(). After internallyIncreaseResourceReferenceCount() runs, modFile may already have been replaced with the hardlinked/copied file returned by PipeTsFileResourceManager.increaseFileReference(...); subsequent calls to getModFile()/isWithMod() will refresh again and can overwrite that pinned path. This can break reference tracking (leaking the pinned mod copy and/or decreasing the wrong path) and can make the sink transfer a different file than the one whose reference count was increased. Consider preventing refresh from overwriting modFile once the event is pinned (e.g., when referenceCount > 0), or track original-vs-pinned mod file paths separately.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1,Better always pin the copied mod file, and copy the file on checked

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — this is a real issue.

PipeTsFileResourceManager.increaseFileReference(...) may replace modFile with a copied file under the pipe dir. If we keep refreshing from resource.getExclusiveModFile().getFile(), we can overwrite that pinned path, which breaks ref tracking (leaks the copied mod / decreases the wrong path) and can make the sink transfer an unpinned file.

I updated refreshModFileState() to not refresh once the event is pinned (referenceCount > 0), so modFile remains the pinned (copied) file after increaseReferenceCount(). Also added a regression test testPinnedModFilePathIsStableAfterIncreaseReferenceCount to ensure repeated getModFile()/isWithMod() calls don\x27t revert to the original mod path.

Patch is pushed to this PR branch.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Agreed. Followed this direction: once the event is pinned we keep modFile as the copied/pinned file (no overwrite back to resource.getExclusiveModFile()), and if the mod is created after pinning we now lazily pin/copy it and then keep that pinned path stable. Added a regression test to ensure pinned mod path stability after increaseReferenceCount().

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: I adjusted the implementation slightly. Instead of completely skipping refresh after pinning, we now avoid overwriting an already-pinned modFile path, but still support the case where the mod file is created after pinning by lazily pinning/copying it (and then keeping that pinned path stable). This keeps ref tracking correct while covering the late-mod edge case.

Comment on lines +409 to +411
(supportMod && pipeTsFileInsertionEvent.isWithMod())
? pipeTsFileInsertionEvent.getModFile()
: null;
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new modFile snapshot still gates getModFile() behind pipeTsFileInsertionEvent.isWithMod(). With the new dynamic refresh semantics, this can miss a late-created mod file that appears after isWithMod() returns false (because getModFile() is never called). To reliably capture late-created mods and avoid double-refresh/TOCTOU, take a single snapshot via getModFile() (and then decide transfer based on modFile != null), while still honoring supportMod.

Suggested change
(supportMod && pipeTsFileInsertionEvent.isWithMod())
? pipeTsFileInsertionEvent.getModFile()
: null;
supportMod ? pipeTsFileInsertionEvent.getModFile() : null;

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — updated this sink to take a single snapshot via getModFile() (when supportMod is true) and then decide by modFile != null.

This removes the isWithMod() gate/TOCTOU window and ensures late-created mods can still be observed. Also, PipeTsFileInsertionEvent.refreshModFileState() now keeps pinned paths stable while still lazily pinning a mod file created after the event is already pinned.

Assert.assertEquals(modFile, originalEvent.getModFile());
Assert.assertTrue(copiedEvent.isWithMod());
Assert.assertEquals(modFile, copiedEvent.getModFile());

Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The added test validates late-created mod visibility on the in-memory event objects, but it doesn't cover the more typical transfer path where increaseReferenceCount() is called and the mod is copied into the pipe hardlink/copy directory. Given the new dynamic refresh logic, adding assertions around increaseReferenceCount() + getModFile() would help catch regressions where the returned modFile path changes after pinning (and reference counts are decreased for a different file).

Suggested change
// Verify that after increasing the reference count (typical transfer path),
// the mod file observed by the copied event remains stable and valid.
copiedEvent.increaseReferenceCount();
final File pinnedModFile = copiedEvent.getModFile();
Assert.assertNotNull(pinnedModFile);
Assert.assertTrue(pinnedModFile.exists());
Assert.assertEquals(pinnedModFile, copiedEvent.getModFile());

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I added a dedicated regression test testPinnedModFilePathIsStableAfterIncreaseReferenceCount to cover the typical transfer path:

  • create the mod after event construction
  • call increaseReferenceCount() to pin/copy it into the pipe dir
  • assert the returned modFile path is stable across repeated getModFile()/isWithMod() calls (and differs from the original path)

This should catch regressions where the pinned mod path gets overwritten or ref counting becomes asymmetric.

@Pengzna Pengzna changed the title fix: preserve late-created deletion mods for IoTConsensusV2 tsfile replication fix(iotv2): order delete materialization and tsfile snapshot Mar 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants