Skip to content

Recover pending commit intents on startup#1251

Open
GabrielePicco wants to merge 7 commits into
masterfrom
fix/recover-pending-commit-intents
Open

Recover pending commit intents on startup#1251
GabrielePicco wants to merge 7 commits into
masterfrom
fix/recover-pending-commit-intents

Conversation

@GabrielePicco
Copy link
Copy Markdown
Collaborator

@GabrielePicco GabrielePicco commented May 27, 2026

Summary

  • Add a startup background recovery pass for persisted pending commit rows.
  • Reconstruct account-only commit/finalize intent bundles from commit_status rows and schedule them without inserting duplicate persistence rows.
  • Gate recovered intents to rows whose last retry is more than 30 minutes old and accounts that are still delegated on base and ER.
  • Use a current local blockhash for recovered ScheduledCommitSent fallback signals.
  • Simulate signed base transactions before sending so failed transactions do not spend validator lamports.

NOTE:

  • The recovery path can change the semantics of persisted pending intent bundles by dropping base actions/callbacks while still committing accounts. This is a correctness issue for supported scheduled intent inputs after restart. This will be solved in MIMD-0025: Intent Execution Durability #1246

Why

If a validator restarts after a base-chain commit/finalize attempt remains pending in persistence, the node currently does not resubmit that persisted work. This leaves already-scheduled account commits dependent on manual intervention.

Notes

  • The recovery path only reconstructs commit/finalize account intents represented by pending commit_status rows.
  • Normal live scheduling still uses the existing persistence path.
  • Recovered scheduling deliberately skips start_base_intents to avoid duplicate row insertion.
  • Recovery only schedules an intent after age, delegation ownership, and simulation checks pass.

Summary by CodeRabbit

  • New Features

    • Added recovery mechanism for pending intents that failed to send initially.
    • Added account delegation status checking across multiple blockchain environments.
  • Bug Fixes

    • Improved handling of disconnected clients in subscription management to prevent stale data processing.
    • Enhanced client connection tracking with automatic recovery from connection failures.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 27, 2026

Warning

Review limit reached

@GabrielePicco, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 49 minutes and 56 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8877d46e-8db0-42da-b612-111d7655e2d8

📥 Commits

Reviewing files that changed from the base of the PR and between f1c62d0 and 8bbee67.

📒 Files selected for processing (8)
  • magicblock-accounts/src/scheduled_commits_processor.rs
  • magicblock-api/src/magic_validator.rs
  • magicblock-chainlink/src/chainlink/fetch_cloner/mod.rs
  • magicblock-chainlink/src/chainlink/mod.rs
  • magicblock-committor-service/src/committor_processor.rs
  • magicblock-committor-service/src/persist/commit_persister.rs
  • magicblock-committor-service/src/persist/db.rs
  • magicblock-committor-service/src/service.rs
📝 Walkthrough

Walkthrough

This PR adds infrastructure to recover and reschedule pending scheduled commit intents that were persisted before completing. The recovery path loads persisted commit status rows, reconstructs them into ScheduledIntentBundles, filters to recoverable intents by checking delegation status on base and ER via Chainlink, inserts recovered intent metadata, issues undelegation requests, and schedules the bundles. The ScheduledCommitsProcessor now handles optional sent-transaction metadata and constructs a fallback "scheduled commit sent" transaction when the sent transaction is missing. Additionally, SubMuxClient now tracks connected clients via a set and filters reconnect/subscription operations to only connected clients to avoid re-subscribing to disconnected RPC endpoints.

Suggested reviewers

  • taco-paco
  • thlorenz
  • bmuddha
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/recover-pending-commit-intents

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@magicblock-accounts/src/scheduled_commits_processor.rs`:
- Around line 200-209: remove_intent_metas currently calls
intents_meta_map.lock().expect(...), which can panic on a poisoned mutex; change
it to handle PoisonError instead and avoid panicking. Replace the expect with a
match (or map_err/unwrap_or_else) on intents_meta_map.lock() to obtain the guard
either from Ok(guard) or from Err(poison) => poison.into_inner(), then proceed
to remove entries from the guard; ensure the function does not panic and logs or
quietly continues on poison to keep error-path callers safe (refer to function
remove_intent_metas and variable intents_meta_map).
- Around line 179-197: The current all-or-nothing cleanup after
committor.schedule_recovered_intent_bundles risks removing all metadata even if
only some bundles failed; either assert batch-or-nothing semantics or make
cleanup granular: change schedule_recovered_intent_bundles to return per-bundle
results (e.g., Result<Vec<BundleResult>, Error> or a struct containing a Vec of
succeeded bundle IDs), then in the caller inspect the returned successes and
call Self::remove_intent_metas(&intents_meta_map, &succeeded_intent_ids) to only
remove metadata for successfully scheduled bundles (update any call sites
accordingly); if you intend batch semantics, add an explicit comment and a test
asserting atomic behavior.
- Around line 156-171: The code currently calls
intents_meta_map.lock().expect(POISONED_MUTEX_MSG) which panics on a poisoned
mutex; replace this .expect() with proper error handling: match on
intents_meta_map.lock(), on Ok(mut intent_metas) proceed to insert
ScheduledBaseIntentMeta::new(intent) and collect undelegate pubkeys as before,
but on Err(poison_err) log the poisoning (including poison_err) and return an
appropriate error from the surrounding function or return early from startup
recovery (do not panic); use the existing symbols intent_bundles,
intents_meta_map, ScheduledBaseIntentMeta::new, pubkeys_being_undelegated and
POISONED_MUTEX_MSG to locate and update the block accordingly.

In `@magicblock-committor-service/src/persist/commit_persister.rs`:
- Around line 249-256: The method get_pending_commit_statuses currently calls
self.commits_db.lock().expect(...) which must be removed; instead handle a
poisoned mutex by mapping the lock() error into the existing CommitPersistResult
error type and returning it (e.g., translate the PoisonError into a
CommitPersistError variant such as MutexPoisoned or Internal), then call
.get_pending_commit_statuses() on the locked guard; update the code in
commit_persister.rs around the get_pending_commit_statuses function to replace
expect() with lock().map_err(|e| /* map to CommitPersistError */)? and propagate
that error so no unwrap/expect is used (reference symbols:
get_pending_commit_statuses and self.commits_db).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: b3b15829-349c-400a-910c-96a13c30ea35

📥 Commits

Reviewing files that changed from the base of the PR and between 4ba4a87 and 52834e3.

📒 Files selected for processing (5)
  • magicblock-accounts/src/scheduled_commits_processor.rs
  • magicblock-committor-service/src/committor_processor.rs
  • magicblock-committor-service/src/persist/commit_persister.rs
  • magicblock-committor-service/src/persist/db.rs
  • magicblock-committor-service/src/service.rs

Comment thread magicblock-accounts/src/scheduled_commits_processor.rs
Comment thread magicblock-accounts/src/scheduled_commits_processor.rs Outdated
Comment thread magicblock-accounts/src/scheduled_commits_processor.rs Outdated
Comment thread magicblock-committor-service/src/persist/commit_persister.rs
@GabrielePicco GabrielePicco force-pushed the fix/recover-pending-commit-intents branch from 52834e3 to d006bda Compare May 27, 2026 10:31
@GabrielePicco GabrielePicco marked this pull request as ready for review May 27, 2026 11:48
Comment thread magicblock-committor-service/src/persist/commit_persister.rs Outdated
payer: Pubkey,
included_pubkeys: Vec<Pubkey>,
intent_sent_transaction: Transaction,
intent_sent_transaction: Option<Transaction>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure Option is applicable here, we should execute it anyway. If on creation infarmation is abasent we should create tx

Copy link
Copy Markdown
Collaborator Author

@GabrielePicco GabrielePicco May 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The recovered path does still execute a scheduled_commit_sent tx. Option is used only because recovered intents are reconstructed from persisted rows and do not have the original sent_transaction/signature available. When it is None, we synthesize a fresh scheduled_commit_sent tx with the current blockhash before executing it.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 445552a74c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread magicblock-accounts/src/scheduled_commits_processor.rs
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: df8d4d6c51

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread magicblock-committor-service/src/committor_processor.rs
@GabrielePicco GabrielePicco requested a review from taco-paco May 27, 2026 13:55
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
magicblock-committor-service/src/persist/commit_persister.rs (1)

251-258: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Replace panic-on-lock with error propagation in pending-status retrieval.

Line 256 uses .expect(POISONED_MUTEX_MSG), which can panic the process on mutex poisoning during a production recovery flow. Return a CommitPersistResult error instead of panicking.

As per coding guidelines, {magicblock-*,programs,storage-proto}/**: Treat any usage of .unwrap() or .expect() in production Rust code as a MAJOR issue.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@magicblock-committor-service/src/persist/commit_persister.rs` around lines
251 - 258, The get_pending_commit_statuses implementation currently calls
self.commits_db.lock().expect(POISONED_MUTEX_MSG) which can panic on mutex
poisoning; change this to handle the lock error and return a CommitPersistResult
error instead of panicking. Replace the expect call with matching or map_err on
the Mutex::lock() result, convert the PoisonError into an appropriate
CommitPersistResult failure (e.g., construct/return a CommitPersistError with
context like "mutex poisoned getting pending commit statuses"), then call
.get_pending_commit_statuses() on the guarded value; keep references to
commits_db.lock(), POISONED_MUTEX_MSG and the function
get_pending_commit_statuses to locate the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@magicblock-chainlink/src/chainlink/mod.rs`:
- Around line 427-442: The code is silently truncating when using
pubkeys.iter().zip(remote_accounts) which can produce a shorter boolean Vec if
lengths differ; before the Ok(pubkeys.iter().zip(remote_accounts)...) block,
validate that pubkeys.len() == remote_accounts.len() and return an appropriate
Err (or an InvalidInput/LengthMismatch error) on mismatch; if lengths match,
proceed with the existing mapping that calls
remote_account.is_owned_by_delegation_program(),
self.accounts_bank.get_account(pubkey).is_some_and(|account| account.delegated()
|| account.owner().eq(&dlp_api::id())), and collect the boolean vector as
before.

In `@magicblock-committor-service/src/committor_processor.rs`:
- Around line 271-274: When reconstructing a recovered bundle from rows (in the
block that starts with let first = rows.first()? and creates
commit_finalize_accounts), validate that every row for the same message_id has
identical slot and ephemeral_blockhash before using
first.slot/first.ephemeral_blockhash to build the bundle; implement a check like
rows.iter().all(|r| r.slot == slot && r.ephemeral_blockhash == blockhash) and if
it returns false, log a warning/error referencing the message_id and skip that
group (do not populate commit_finalize_accounts or proceed with reconstruction),
applying the same validation for the subsequent code block around lines 277–288
that also assumes consistent metadata.

In `@magicblock-rpc-client/src/utils.rs`:
- Around line 209-212: The match currently treats all
MagicBlockRpcClientError::SimulatedTransactionError(_) as terminal which
prevents retrying recoverable simulation failures; update the error handling so
MagicBlockRpcClientError::SimulatedTransactionError(inner) is inspected and if
inner is a TransactionError::BlockhashNotFound (or equivalent variant) it is
classified as retryable (i.e., do not include it in the terminal arm), otherwise
keep non-retryable behavior for other simulated errors; specifically change the
match that groups MagicBlockRpcClientError::GetSlot, ::LookupTableDeserialize,
::SentTransactionError, ::SimulatedTransactionError(_) to instead pattern-match
::SimulatedTransactionError(inner) and branch on inner for BlockhashNotFound.

---

Duplicate comments:
In `@magicblock-committor-service/src/persist/commit_persister.rs`:
- Around line 251-258: The get_pending_commit_statuses implementation currently
calls self.commits_db.lock().expect(POISONED_MUTEX_MSG) which can panic on mutex
poisoning; change this to handle the lock error and return a CommitPersistResult
error instead of panicking. Replace the expect call with matching or map_err on
the Mutex::lock() result, convert the PoisonError into an appropriate
CommitPersistResult failure (e.g., construct/return a CommitPersistError with
context like "mutex poisoned getting pending commit statuses"), then call
.get_pending_commit_statuses() on the guarded value; keep references to
commits_db.lock(), POISONED_MUTEX_MSG and the function
get_pending_commit_statuses to locate the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 97707491-1d08-4fc7-bae3-cd0e76ff1465

📥 Commits

Reviewing files that changed from the base of the PR and between 52834e3 and f1c62d0.

📒 Files selected for processing (11)
  • magicblock-accounts/src/scheduled_commits_processor.rs
  • magicblock-api/src/magic_validator.rs
  • magicblock-chainlink/src/chainlink/fetch_cloner/mod.rs
  • magicblock-chainlink/src/chainlink/mod.rs
  • magicblock-committor-service/src/committor_processor.rs
  • magicblock-committor-service/src/intent_executor/intent_execution_client.rs
  • magicblock-committor-service/src/persist/commit_persister.rs
  • magicblock-committor-service/src/persist/db.rs
  • magicblock-committor-service/src/service.rs
  • magicblock-rpc-client/src/lib.rs
  • magicblock-rpc-client/src/utils.rs

Comment thread magicblock-chainlink/src/chainlink/mod.rs
Comment thread magicblock-committor-service/src/committor_processor.rs
Comment thread magicblock-rpc-client/src/utils.rs Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 076a83884e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread magicblock-accounts/src/scheduled_commits_processor.rs
@GabrielePicco GabrielePicco force-pushed the fix/recover-pending-commit-intents branch from 1f20bcc to f73b081 Compare May 27, 2026 15:48
- Batch recovery scheduling instead of one actor round-trip per intent
- Drop redundant status filter (SQL WHERE already filters by Pending)
- Rename schedule_recovered_intent_bundle -> _bundles (Vec param)
- Move pending-intent recovery spawn out of ScheduledCommitsProcessorImpl::new
  into spawn_pending_intents_recovery, invoked from MagicValidator::start after
  ledger replay + reset_accounts_bank so the local bank reflects post-replay
  delegated state.
- Validate that grouped commit_status rows agree on slot/ephemeral_blockhash
  before reconstructing a recovered bundle; skip the message_id on mismatch.
- Reject silent length mismatches in accounts_delegated_on_base_and_er by
  returning ChainlinkError::UnexpectedAccountCount when fetch returns a
  different count than requested.
- Remove the universal pre-send simulate_transaction call along with its
  client method, error variants, and util mappings. The on-chain dlp
  ownership gate in accounts_delegated_on_base_and_er already prevents
  committing stale account snapshots.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f73b0817fa

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread magicblock-committor-service/src/committor_processor.rs
@GabrielePicco GabrielePicco force-pushed the fix/recover-pending-commit-intents branch from f73b081 to 8bbee67 Compare May 27, 2026 15:54
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8bbee67008

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread magicblock-committor-service/src/committor_processor.rs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants