Skip to content

fix: node crashing on lagged events and log actor nonce errors#7020

Merged
akaladarshi merged 3 commits intomainfrom
akaladarshi/fix-head-change-rcx-err
May 8, 2026
Merged

fix: node crashing on lagged events and log actor nonce errors#7020
akaladarshi merged 3 commits intomainfrom
akaladarshi/fix-head-change-rcx-err

Conversation

@akaladarshi
Copy link
Copy Markdown
Collaborator

@akaladarshi akaladarshi commented May 7, 2026

Summary of changes

Changes introduced in this pull request:

  • Head change receiver lagged event caused indexer service to crash which resulted in node crash
  • Log the error if their is an error in head_change get state nonce instead of erroring out and continue to apply rest of the message:

Reference issue to close (if applicable)

Closes

Other information and links

Change checklist

  • I have performed a self-review of my own code,
  • I have made corresponding changes to the documentation. All new code adheres to the team's documentation standards,
  • I have added tests that prove my fix is effective or that my feature works (if possible),
  • I have made sure the CHANGELOG is up-to-date. All user-facing changes should be reflected in this document.

Outside contributions

  • I have read and agree to the CONTRIBUTING document.
  • I have read and agree to the AI Policy document. I understand that failure to comply with the guidelines will lead to rejection of the pull request.

Summary by CodeRabbit

  • Bug Fixes
    • Enhanced error handling and graceful recovery in indexer and message pool services for improved stability.
    • Improved robustness when processing chain state notifications and queries.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 7, 2026

Review Change Stack

Walkthrough

Head-change subscription and message processing now explicitly handle receive errors and state lookup failures across the indexer daemon and message pool, replacing error propagation with graceful logging and continuation for lag events and state sequence lookup failures.

Changes

Head-change Error Handling

Layer / File(s) Summary
Indexer Head-change Subscription
src/daemon/mod.rs
Indexer's head-change loop now matches on RecvError::Lagged(n) to log and skip, and RecvError::Closed to exit cleanly, replacing direct error propagation.
Message Pool Head-change Subscriber
src/message_pool/msgpool/msg_pool.rs
Message pool applies the same pattern: lag events are logged with the lag count {n}, and closed channels cause clean loop exit.
Re-org Message Re-add Resilience
src/message_pool/msgpool/mod.rs
State sequence lookup failures during re-org message re-addition are now logged at debug level and that message is skipped, allowing subsequent messages to continue processing.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

  • ChainSafe/forest#6965: Refactors head-change and MpoolCtx to use PendingStore alongside the per-message error handling introduced here.
  • ChainSafe/forest#6890: Converts per-message state/address resolution failures into debug logs and skips affected messages, using the same non-fatal error handling pattern.

Suggested reviewers

  • sudo-shashank
  • hanabi1224
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The PR title directly and accurately describes the two main fixes: preventing node crashes from lagged events and logging actor nonce errors during head_change processing.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch akaladarshi/fix-head-change-rcx-err
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch akaladarshi/fix-head-change-rcx-err

Comment @coderabbitai help to get the list of available commands and usage tips.

@akaladarshi akaladarshi added the RPC requires calibnet RPC checks to run on CI label May 7, 2026
@akaladarshi akaladarshi marked this pull request as ready for review May 8, 2026 08:04
@akaladarshi akaladarshi requested a review from a team as a code owner May 8, 2026 08:04
@akaladarshi akaladarshi requested review from hanabi1224 and sudo-shashank and removed request for a team May 8, 2026 08:04
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/daemon/mod.rs (1)

620-625: ⚡ Quick win

Add error context to indexer processing failures.

These ? propagations lose tipset-specific context, which makes service-failure triage harder.

🔧 Suggested patch
-                            let delegated_messages = chain_store
-                                .headers_delegated_messages(ts.block_headers().iter())?;
-                            chain_store.process_signed_messages(&delegated_messages)?;
+                            let delegated_messages = chain_store
+                                .headers_delegated_messages(ts.block_headers().iter())
+                                .with_context(|| {
+                                    format!(
+                                        "indexer: failed to load delegated messages for tipset {}",
+                                        ts.key()
+                                    )
+                                })?;
+                            chain_store
+                                .process_signed_messages(&delegated_messages)
+                                .with_context(|| {
+                                    format!(
+                                        "indexer: failed to process signed messages for tipset {}",
+                                        ts.key()
+                                    )
+                                })?;

As per coding guidelines: "Use anyhow::Result<T> for most operations and add context with .context() when errors occur".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/daemon/mod.rs` around lines 620 - 625, The current loop over
changes.applies uses fallible calls chain_store.headers_delegated_messages(...)
and chain_store.process_signed_messages(...) with plain `?`, losing tipset
context; update both calls to use anyhow::Context (e.g., .context(...) or
.with_context(...)) to append tipset-specific information (use ts.key()) to the
error messages so failures include which tipset failed, and ensure
anyhow::Context is imported; locate the block iterating `for ts in
changes.applies` and change the two calls to add contextual messages referencing
`ts.key()` for easier triage.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/message_pool/msgpool/mod.rs`:
- Around line 309-315: The debug log in the Err branch of
mpool_ctx.get_state_sequence(state_nonce_cache, &msg.from()) omits message
identity, making dropped reorg messages hard to trace; change the tracing::debug
call inside the Err(e) arm to include the message sender and CID (use msg.from()
and msg.cid() or the appropriate CID accessor on msg) along with the error so
the log reads something like "Failed to get the state sequence for msg from {}
cid {}: {}", then continue as before.

---

Nitpick comments:
In `@src/daemon/mod.rs`:
- Around line 620-625: The current loop over changes.applies uses fallible calls
chain_store.headers_delegated_messages(...) and
chain_store.process_signed_messages(...) with plain `?`, losing tipset context;
update both calls to use anyhow::Context (e.g., .context(...) or
.with_context(...)) to append tipset-specific information (use ts.key()) to the
error messages so failures include which tipset failed, and ensure
anyhow::Context is imported; locate the block iterating `for ts in
changes.applies` and change the two calls to add contextual messages referencing
`ts.key()` for easier triage.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 607c7846-20fb-4377-b3c2-ec40e1312043

📥 Commits

Reviewing files that changed from the base of the PR and between 9259f45 and b380ac7.

📒 Files selected for processing (3)
  • src/daemon/mod.rs
  • src/message_pool/msgpool/mod.rs
  • src/message_pool/msgpool/msg_pool.rs

Comment thread src/message_pool/msgpool/mod.rs
@codecov
Copy link
Copy Markdown

codecov Bot commented May 8, 2026

Codecov Report

❌ Patch coverage is 11.76471% with 15 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.20%. Comparing base (8f2b32b) to head (b380ac7).
⚠️ Report is 1 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/daemon/mod.rs 0.00% 10 Missing ⚠️
src/message_pool/msgpool/mod.rs 40.00% 3 Missing ⚠️
src/message_pool/msgpool/msg_pool.rs 0.00% 2 Missing ⚠️
Additional details and impacted files
Files with missing lines Coverage Δ
src/message_pool/msgpool/msg_pool.rs 87.63% <0.00%> (ø)
src/message_pool/msgpool/mod.rs 91.08% <40.00%> (-0.22%) ⬇️
src/daemon/mod.rs 28.02% <0.00%> (-0.24%) ⬇️

... and 5 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9259f45...b380ac7. Read the comment docs.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@akaladarshi akaladarshi added this pull request to the merge queue May 8, 2026
Merged via the queue into main with commit 51e9207 May 8, 2026
34 checks passed
@akaladarshi akaladarshi deleted the akaladarshi/fix-head-change-rcx-err branch May 8, 2026 08:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

RPC requires calibnet RPC checks to run on CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants