Skip to content

Locked splice bug fixes during channel reestablishment#4624

Open
wpaulino wants to merge 2 commits into
lightningdevkit:mainfrom
wpaulino:splice-locked-reestablish-fixes
Open

Locked splice bug fixes during channel reestablishment#4624
wpaulino wants to merge 2 commits into
lightningdevkit:mainfrom
wpaulino:splice-locked-reestablish-fixes

Conversation

@wpaulino
Copy link
Copy Markdown
Contributor

This PR includes fixes for two bugs/edge cases when handling a locked splice during reestablishment. These were found by the chanmon_consistency fuzz target.

wpaulino added 2 commits May 19, 2026 16:30
In most cases, we end up sending our `splice_locked` either implicitly
during reestablishment via
`ChannelReestablish::my_current_funding_locked`, or explicitly after
reestablishment. However, we did not consider that it's possible for the
node to be notified of the splice confirmation after connecting to our
peer but prior to reestablishing our channel. In such cases, we must
allow the `splice_locked` to be sent, since the
`my_current_funding_locked` sent in the `channel_reestablish` to the
counterparty still points to the pre-splice txid.

Found by the `chanmon_consistency` fuzz target.
Upon channel reestablishment, we free our holding cells to send any
pending updates to our peer. If we happened to implicitly lock a pending
splice during reestablishment, we want to make sure any updates we send
after the fact are considering the new channel state (post-splice), even
if the update was queued while the splice was still pending. Therefore,
we must always handle the inferred `splice_locked` first.

Found by the `chanmon_consistency` fuzz target.
@wpaulino wpaulino added this to the 0.3 milestone May 19, 2026
@wpaulino wpaulino requested review from TheBlueMatt and jkczyz May 19, 2026 23:31
@wpaulino wpaulino self-assigned this May 19, 2026
@ldk-reviews-bot
Copy link
Copy Markdown

ldk-reviews-bot commented May 19, 2026

👋 Thanks for assigning @jkczyz as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

@ldk-claude-review-bot
Copy link
Copy Markdown
Collaborator

I've thoroughly reviewed the entire PR diff, including both production code changes and all test code. Here is my assessment:

No issues found.

Both fixes are correct:

  1. peer_state.is_connected change (line 16360): Correctly uses the peer-level connection flag instead of the channel-level one. There is a window between peer_connected and channel_reestablish where the peer is connected at the transport level but the channel's disconnected bit hasn't been cleared yet. Using peer_state.is_connected ensures splice_locked is enqueued in this window.

  2. Holding cell reorder (lines 13349-13362): Correctly moves check_free_peer_holding_cells to after internal_splice_locked processing, ensuring splice promotion happens before commitment updates are generated from freed holding cells. The re-acquisition of per_peer_state and peer_state_mutex locks follows the standard lock ordering pattern used throughout the codebase.

The complete_splice_locked_exchange extraction from lock_splice is a clean refactor that enables the new test to call the shared logic with a pre-computed prev_funding_txid. Both new tests cover the specific edge cases well, including the async monitor update scenario in test_holding_cell_claim_freed_after_inferred_splice_locked.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 20, 2026

Codecov Report

❌ Patch coverage is 72.72727% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.59%. Comparing base (1060865) to head (16fb802).

Files with missing lines Patch % Lines
lightning/src/ln/channelmanager.rs 72.72% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4624      +/-   ##
==========================================
- Coverage   86.59%   86.59%   -0.01%     
==========================================
  Files         159      159              
  Lines      110420   110426       +6     
  Branches   110420   110426       +6     
==========================================
+ Hits        95619    95624       +5     
+ Misses      12267    12265       -2     
- Partials     2534     2537       +3     
Flag Coverage Δ
fuzzing-fake-hashes 6.57% <0.00%> (-0.01%) ⬇️
fuzzing-real-hashes 23.15% <63.63%> (+<0.01%) ⬆️
tests 86.23% <72.72%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment on lines 16351 to 16356
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the spec require that we have exchanged channel_reestablish before sending any other messages?

https://github.com/lightning/bolts/blob/aff897f8f6ac0a7ba071b3b61fede7dc813be0b5/02-peer-protocol.md?plain=1#L3432-L3433

IIUC, this would let us enqueue splice_locked before receiving channel_reestablish from our peer. Maybe in practice they should have already have sent it, but we may not have processed it yet.

Comment on lines +13352 to +13362
// A reestablish may infer a missed `splice_locked`; apply it before freeing holding
// cells so we don't generate commitment updates against stale splice state.
let holding_cell_res = {
let per_peer_state = self.per_peer_state.read().unwrap();
let peer_state_mutex = per_peer_state.get(counterparty_node_id).ok_or_else(|| {
MsgHandleErrInternal::unreachable_no_such_peer(counterparty_node_id, msg.channel_id)
})?;
let mut peer_state_lock = peer_state_mutex.lock().unwrap();
self.check_free_peer_holding_cells(&mut *peer_state_lock)
};
self.handle_holding_cell_free_result(holding_cell_res);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we refactor the guts of internal_splice_locked into a separate function so we don't take the peer state lock three times?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants