Fix: add drain_barrier_mask for full-stop barrier in sync_start drain protocol#501
Open
yanghaoran29 wants to merge 1 commit intohw-native-sys:mainfrom
Open
Fix: add drain_barrier_mask for full-stop barrier in sync_start drain protocol#501yanghaoran29 wants to merge 1 commit intohw-native-sys:mainfrom
yanghaoran29 wants to merge 1 commit intohw-native-sys:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a four-phase drain protocol in aicpu_executor.cpp by adding a full-stop barrier to ensure all threads have finished tracker writes before the election phase. A critical logic error was identified in the implementation of the new barrier: the spin-wait loop checks drain_worker_elected to detect a reset, but since this variable is only set after the loop, the condition is always true, leading to premature returns and potential livelocks. It is recommended to use drain_ack_mask to monitor for resets instead.
src/a2a3/runtime/tensormap_and_ringbuffer/aicpu/aicpu_executor.cpp
Outdated
Show resolved
Hide resolved
src/a5/runtime/tensormap_and_ringbuffer/aicpu/aicpu_executor.cpp
Outdated
Show resolved
Hide resolved
… protocol Add a second barrier (drain_barrier_mask) between the ack barrier and election in the sync_start drain protocol. This ensures all threads have fully stopped tracker writes before any thread proceeds to election, preventing the race where a non-elected thread may still write to core_trackers_ while the drain worker has exclusive access. While spinning on drain_barrier_mask, detect reset using drain_ack_mask (acquire loads; release store when clearing on insufficient resources), not drain_worker_elected, which remains zero until after the barrier completes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add a second barrier (drain_barrier_mask) between the ack barrier and election in the sync_start drain protocol. This ensures all threads have fully stopped tracker writes before any thread proceeds to election, preventing the race where a non-elected thread may still write to core_trackers_ while the drain worker has exclusive access.