fix(iotv2): order delete materialization and tsfile snapshot#17335
fix(iotv2): order delete materialization and tsfile snapshot#17335Pengzna wants to merge 9 commits intoapache:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Fixes an IoTConsensusV2 tsfile replication inconsistency where a .mods file can be created after a PipeTsFileInsertionEvent is constructed, causing late-created deletions to be missed during transfer.
Changes:
- Make
PipeTsFileInsertionEventre-evaluate mod-file existence dynamically and preserve the “should transfer mod” intent across shallow copies. - Update IoTConsensusV2 / thrift sink handlers to decide “transfer mod” based on a single
getModFile()snapshot. - Add a regression test for late-created mod visibility after shallow copy.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/pipe/event/common/tsfile/PipeTsFileInsertionEvent.java | Introduces dynamic mod-file refresh logic and preserves “should transfer mod” across shallow copies. |
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/pipe/sink/protocol/thrift/async/IoTDBDataRegionAsyncSink.java | Uses a local modFile snapshot to configure tsfile transfer handler. |
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/pipe/sink/protocol/iotconsensusv2/handler/IoTConsensusV2TsFileInsertionEventHandler.java | Aligns transfer decision to modFile != null and simplifies reader selection. |
| iotdb-core/datanode/src/main/java/org/apache/iotdb/db/pipe/sink/protocol/iotconsensusv2/IoTConsensusV2SyncSink.java | Switches to modFile != null for mod transfer decision. |
| iotdb-core/datanode/src/test/java/org/apache/iotdb/db/pipe/event/PipeTsFileInsertionEventTest.java | Adds regression test for late-created mod observation after shallow copy. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| return; | ||
| } | ||
|
|
||
| isWithMod = resource.anyModFileExists(); | ||
| modFile = isWithMod ? resource.getExclusiveModFile().getFile() : null; |
There was a problem hiding this comment.
refreshModFileState() always resets modFile from resource.getExclusiveModFile().getFile(). After internallyIncreaseResourceReferenceCount() runs, modFile may already have been replaced with the hardlinked/copied file returned by PipeTsFileResourceManager.increaseFileReference(...); subsequent calls to getModFile()/isWithMod() will refresh again and can overwrite that pinned path. This can break reference tracking (leaking the pinned mod copy and/or decreasing the wrong path) and can make the sink transfer a different file than the one whose reference count was increased. Consider preventing refresh from overwriting modFile once the event is pinned (e.g., when referenceCount > 0), or track original-vs-pinned mod file paths separately.
There was a problem hiding this comment.
+1,Better always pin the copied mod file, and copy the file on checked
There was a problem hiding this comment.
Good catch — this is a real issue.
PipeTsFileResourceManager.increaseFileReference(...) may replace modFile with a copied file under the pipe dir. If we keep refreshing from resource.getExclusiveModFile().getFile(), we can overwrite that pinned path, which breaks ref tracking (leaks the copied mod / decreases the wrong path) and can make the sink transfer an unpinned file.
I updated refreshModFileState() to not refresh once the event is pinned (referenceCount > 0), so modFile remains the pinned (copied) file after increaseReferenceCount(). Also added a regression test testPinnedModFilePathIsStableAfterIncreaseReferenceCount to ensure repeated getModFile()/isWithMod() calls don\x27t revert to the original mod path.
Patch is pushed to this PR branch.
There was a problem hiding this comment.
Thanks! Agreed. Followed this direction: once the event is pinned we keep modFile as the copied/pinned file (no overwrite back to resource.getExclusiveModFile()), and if the mod is created after pinning we now lazily pin/copy it and then keep that pinned path stable. Added a regression test to ensure pinned mod path stability after increaseReferenceCount().
There was a problem hiding this comment.
Update: I adjusted the implementation slightly. Instead of completely skipping refresh after pinning, we now avoid overwriting an already-pinned modFile path, but still support the case where the mod file is created after pinning by lazily pinning/copying it (and then keeping that pinned path stable). This keeps ref tracking correct while covering the late-mod edge case.
| (supportMod && pipeTsFileInsertionEvent.isWithMod()) | ||
| ? pipeTsFileInsertionEvent.getModFile() | ||
| : null; |
There was a problem hiding this comment.
This new modFile snapshot still gates getModFile() behind pipeTsFileInsertionEvent.isWithMod(). With the new dynamic refresh semantics, this can miss a late-created mod file that appears after isWithMod() returns false (because getModFile() is never called). To reliably capture late-created mods and avoid double-refresh/TOCTOU, take a single snapshot via getModFile() (and then decide transfer based on modFile != null), while still honoring supportMod.
| (supportMod && pipeTsFileInsertionEvent.isWithMod()) | |
| ? pipeTsFileInsertionEvent.getModFile() | |
| : null; | |
| supportMod ? pipeTsFileInsertionEvent.getModFile() : null; |
There was a problem hiding this comment.
Agreed — updated this sink to take a single snapshot via getModFile() (when supportMod is true) and then decide by modFile != null.
This removes the isWithMod() gate/TOCTOU window and ensures late-created mods can still be observed. Also, PipeTsFileInsertionEvent.refreshModFileState() now keeps pinned paths stable while still lazily pinning a mod file created after the event is already pinned.
| Assert.assertEquals(modFile, originalEvent.getModFile()); | ||
| Assert.assertTrue(copiedEvent.isWithMod()); | ||
| Assert.assertEquals(modFile, copiedEvent.getModFile()); | ||
|
|
There was a problem hiding this comment.
The added test validates late-created mod visibility on the in-memory event objects, but it doesn't cover the more typical transfer path where increaseReferenceCount() is called and the mod is copied into the pipe hardlink/copy directory. Given the new dynamic refresh logic, adding assertions around increaseReferenceCount() + getModFile() would help catch regressions where the returned modFile path changes after pinning (and reference counts are decreased for a different file).
| // Verify that after increasing the reference count (typical transfer path), | |
| // the mod file observed by the copied event remains stable and valid. | |
| copiedEvent.increaseReferenceCount(); | |
| final File pinnedModFile = copiedEvent.getModFile(); | |
| Assert.assertNotNull(pinnedModFile); | |
| Assert.assertTrue(pinnedModFile.exists()); | |
| Assert.assertEquals(pinnedModFile, copiedEvent.getModFile()); |
There was a problem hiding this comment.
Good point. I added a dedicated regression test testPinnedModFilePathIsStableAfterIncreaseReferenceCount to cover the typical transfer path:
- create the mod after event construction
- call
increaseReferenceCount()to pin/copy it into the pipe dir - assert the returned
modFilepath is stable across repeatedgetModFile()/isWithMod()calls (and differs from the original path)
This should catch regressions where the pinned mod path gets overwritten or ref counting becomes asymmetric.
Summary
This PR fixes the IoTConsensusV2 replica inconsistency observed when realtime TsFile replication races with local CLI
DELETE.The actual bug is not simply that delete and TsFile run concurrently. The missing ordering is between:
PipeTsFileInsertionEventfreezes the TsFile/mod snapshot that will be transferredWithout that ordering, a TsFile snapshot may:
Root Cause
On the leader, sealed-file delete materialization and realtime TsFile snapshot freezing were not serialized against each other.
That left a window where:
.modsFollowers still apply events in replicateIndex order, but they can only be correct if the leader-side TsFile snapshot already reflects the right delete visibility.
Changes
1. Ordered barrier for delete vs TsFile snapshot
Add a region-ordered, path-aware barrier in
PipeTsFileDeletionBarrier:beginDeletion(regionId)allocates a monotonicdeleteSeqresolveDeletionTargets(regionId, deleteSeq, tsFilePaths)registers the final sealed TsFile targetsbeginSnapshot(regionId, tsFilePath)capturessnapshotUpperBoundawaitDeletionResolutionUpTo(...)andawaitPendingDeletionsUpTo(...)ensure the snapshot sees all earlier deletesawaitSnapshotsBeforeMaterialization(...)blocks later deletes from materializing into a TsFile whose earlier snapshot is still freezingThis guarantees:
assignReplicateIndexare included in the TsFile snapshotassignReplicateIndexare not mixed into that snapshot2. Delay mod snapshot for IoTV2 realtime TsFile copies
For IoTConsensusV2 realtime TsFile events:
setReplicateIndexForIoTV2(...)resource.readLock()+exclusiveModFile.writeLock()3. Separate live mod visibility from pinned mod snapshot
PipeTsFileInsertionEventnow cleanly separates:TsFileResourceThis prevents:
shallowCopy4. Unify DataRegion delete paths
deleteByDevice,deleteByTable, anddeleteDataDirectlynow share the same local helper flow:deleteSeq5. Regression tests
Added/updated tests to cover:
Verification
Executed successfully:
Result:
OK (11 tests)andBUILD SUCCESS.Generated by @codex