Skip to content

DAOS-18633 rebuild: abort orphaned reclaim rpt after PS leader switch…#17766

Open
wangshilong wants to merge 1 commit intorelease/2.6from
shilongw/DAOS-18633-2.6
Open

DAOS-18633 rebuild: abort orphaned reclaim rpt after PS leader switch…#17766
wangshilong wants to merge 1 commit intorelease/2.6from
shilongw/DAOS-18633-2.6

Conversation

@wangshilong
Copy link
Contributor

… (#17652)

After PS leader switch, ds_rebuild_regenerate_task() only regenerates rebuild tasks for DOWN/DRAIN/UP targets. RECLAIM tasks are not regenerated because reintegrated targets are already UPIN. This leaves orphaned rpt on every target with a stale leader term, whose IV updates are silently dropped by the new leader (no matching rgt). The result is sp_rebuilding > 0 permanently, blocking EC aggregation and causing system-wide performance degradation.

Fix: detect stale leader term in rebuild_tgt_status_check_ult() and abort the orphaned rpt.

Steps for the author:

  • Commit message follows the guidelines.
  • Appropriate Features or Test-tag pragmas were used.
  • Appropriate Functional Test Stages were run.
  • At least two positive code reviews including at least one code owner from each category referenced in the PR.
  • Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

  • Gatekeeper requested (daos-gatekeeper added as a reviewer).

…#17652)

After PS leader switch, ds_rebuild_regenerate_task() only regenerates
rebuild tasks for DOWN/DRAIN/UP targets. RECLAIM tasks are not
regenerated because reintegrated targets are already UPIN. This
leaves orphaned rpt on every target with a stale leader term, whose
IV updates are silently dropped by the new leader (no matching rgt).
The result is sp_rebuilding > 0 permanently, blocking EC aggregation
and causing system-wide performance degradation.

Fix: detect stale leader term in rebuild_tgt_status_check_ult() and
abort the orphaned rpt.

Signed-off-by: Wang Shilong <shilong.wang@hpe.com>
@wangshilong wangshilong requested review from a team as code owners March 24, 2026 01:33
@wangshilong wangshilong added the clean-cherry-pick Cherry-pick from another branch that did not require additional edits label Mar 24, 2026
@github-actions
Copy link

Errors are Unable to load ticket data
https://daosio.atlassian.net/browse/DAOS-18633

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clean-cherry-pick Cherry-pick from another branch that did not require additional edits

Development

Successfully merging this pull request may close these issues.

1 participant