Skip to content

fix(flatkv): harden error handling for readonly store and crash cleanup#3127

Open
blindchaser wants to merge 3 commits intomainfrom
yiren/readonly-fix
Open

fix(flatkv): harden error handling for readonly store and crash cleanup#3127
blindchaser wants to merge 3 commits intomainfrom
yiren/readonly-fix

Conversation

@blindchaser
Copy link
Copy Markdown
Contributor

  • Propagate real FS errors (ReadDir/RemoveAll) instead of silently discarding them; only downgrade file-lock acquisition failures to non-fatal log in rootmulti startup
  • Lazy-init file lock in loadVersionReadOnly so fresh stores can load readonly without breaking the existing API contract
  • Upgrade composite readonly FlatKV failure log from Info to Error

Describe your changes and provide context

Testing performed to validate your change

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 27, 2026

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedMar 27, 2026, 7:18 PM

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 27, 2026

Codecov Report

❌ Patch coverage is 34.37500% with 21 lines in your changes missing coverage. Please review.
✅ Project coverage is 58.74%. Comparing base (841ac23) to head (221dcfc).

Files with missing lines Patch % Lines
sei-db/state_db/sc/flatkv/store_lifecycle.go 12.50% 4 Missing and 3 partials ⚠️
sei-cosmos/storev2/rootmulti/store.go 0.00% 5 Missing ⚠️
sei-db/state_db/sc/flatkv/snapshot.go 20.00% 2 Missing and 2 partials ⚠️
sei-db/state_db/sc/flatkv/store.go 70.00% 2 Missing and 1 partial ⚠️
sei-db/common/errors/errors.go 33.33% 2 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3127      +/-   ##
==========================================
- Coverage   58.76%   58.74%   -0.02%     
==========================================
  Files        2094     2094              
  Lines      173071   173085      +14     
==========================================
- Hits       101698   101680      -18     
- Misses      62319    62344      +25     
- Partials     9054     9061       +7     
Flag Coverage Δ
sei-chain-pr 63.27% <34.37%> (?)
sei-db 70.41% <ø> (-0.22%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
sei-db/state_db/sc/composite/store.go 67.92% <100.00%> (ø)
sei-db/common/errors/errors.go 66.66% <33.33%> (-33.34%) ⬇️
sei-db/state_db/sc/flatkv/store.go 70.62% <70.00%> (+0.21%) ⬆️
sei-db/state_db/sc/flatkv/snapshot.go 65.78% <20.00%> (-0.78%) ⬇️
sei-cosmos/storev2/rootmulti/store.go 46.25% <0.00%> (-0.25%) ⬇️
sei-db/state_db/sc/flatkv/store_lifecycle.go 52.63% <12.50%> (-5.12%) ⬇️

... and 23 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e863c6dde6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +109 to +110
if err := s.acquireFileLock(dir); err != nil {
return err
return fmt.Errorf("%w: %v", commonerrors.ErrFileLockUnavailable, err)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve non-contention lock errors during cleanup

CleanupOrphanedReadOnlyDirs now maps every acquireFileLock failure to ErrFileLockUnavailable, which conflates true lock contention with real filesystem/syscall failures (e.g., permission or lock-file creation errors). Because startup code treats IsFileLockError as non-fatal, these hard failures can be downgraded to a log and execution continues with an unusable store. Only the explicit "already held" case should be tagged as lock-unavailable; other errors should propagate as fatal.

Useful? React with 👍 / 👎.

Comment on lines +97 to +99
if commonerrors.IsFileLockError(err) {
logger.Error("non-fatal: failed to acquire file lock for cleanup", "err", err)
} else {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Fail fast instead of ignoring cleanup lock failure

This branch now treats cleanup lock acquisition errors as non-fatal and continues startup, but in the current startup path LoadVersionAndUpgrade drops rs.scStore.LoadVersion(..., false) errors (store.go lines 440-441). In lock-contention scenarios, this can let the node proceed without a successfully opened SC backend instead of stopping early. Until downstream error propagation is fixed, lock acquisition failure here should remain fatal.

Useful? React with 👍 / 👎.

- Propagate real FS errors (ReadDir/RemoveAll) instead of silently
  discarding them; only downgrade file-lock acquisition failures to
  non-fatal log in rootmulti startup
- Lazy-init file lock in loadVersionReadOnly so fresh stores can load
  readonly without breaking the existing API contract
- Upgrade composite readonly FlatKV failure log from Info to Error
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant