Skip to content

staticaddr/withdraw: add retry and reorg handling for withdrawals#1134

Open
kaldun-tech wants to merge 1 commit into
lightninglabs:masterfrom
kaldun-tech:withdraw-retry-confirmation-watching-reorgs
Open

staticaddr/withdraw: add retry and reorg handling for withdrawals#1134
kaldun-tech wants to merge 1 commit into
lightninglabs:masterfrom
kaldun-tech:withdraw-retry-confirmation-watching-reorgs

Conversation

@kaldun-tech
Copy link
Copy Markdown

@kaldun-tech kaldun-tech commented May 1, 2026

Summary

Fixes Issue #1087

Addresses three related issues in the static address withdrawal manager:

  1. No retry on registration failure - RegisterSpendNtfn / RegisterConfirmationsNtfn failures silently stopped monitoring
  2. No reorg handling - Reorged transactions weren't re-registered
  3. State persistence race - handleWithdrawal failure prevented state persistence, breaking recovery

Changes

  • Reorder state persistence to happen BEFORE handleWithdrawal so recovery works even if monitoring setup fails
  • Add retry-on-next-block for registration failures
  • Add reorg detection via lndclient.WithReOrgChan() - on reorg after spend detection, returns to spend watching (not just confirmation re-registration) since the spend tx itself may be invalidated
  • Make handleWithdrawal async/best-effort since tx is already published

Design Note: Best-Effort Monitoring

After WithdrawDeposits publishes the transaction and persists deposits to Withdrawing state, subsequent operations (withdrawal record creation, spend monitoring) are best-effort. Errors are logged but don't fail the RPC API call because:

  1. The user's funds are safe after the tx is broadcast
  2. State is persisted because deposits are in Withdrawing state
  3. recoverWithdrawals will retry monitoring on restart

Testing

Unit tests added for retry logic and clean shutdown:

  • TestHandleWithdrawal_HappyPath
  • TestHandleWithdrawal_SpendRegistrationRetry
  • TestHandleWithdrawal_ConfirmationRegistrationRetry
  • TestHandleWithdrawal_ContextCanceled
  • TestHandleWithdrawal_BlockChannelClosed
  • TestHandleWithdrawal_SpendErrorReregisters

I decided to defer tests related to reorg. These would require MockChainNotifier extension to capture lndclient.NotifierOption

# Withdraw tests
go test ./staticaddr/withdraw/...

# Linting via make requires docker
make lint
# No docker
golangci-lint run ./staticaddr/withdraw/...

Questions for Maintainers

  1. RBF hash cleanup: When an RBF (Replace-By-Fee) replacement confirms, SpenderTxHash is used to delete from finalizedWithdrawalTxns. If the original tx was fee-bumped, does the original txHash remain in the map? Should both be cleaned up?
  2. Variable naming: The allDeposited flag indicates first withdrawal vs fee bump. How about we rename this to isFirstWithdrawal for clarity?

Pull Request Checklist

  • Update release_notes.md if your PR contains major features, breaking changes or bugfixes

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request improves the reliability of the static address withdrawal manager by addressing critical gaps in error handling and state management. By introducing robust retry mechanisms for notification registration, implementing reorg detection, and ensuring state is persisted before monitoring begins, the system is now more resilient to transient chain issues and process restarts.

Highlights

  • Robust Withdrawal Monitoring: Implemented automatic retry logic for registration failures, ensuring that monitoring does not silently stop if notification registration fails.
  • Reorg Handling: Integrated reorg detection using lndclient's WithReOrgChan, allowing the manager to correctly re-register for spend notifications if a transaction is invalidated by a reorg.
  • State Persistence Ordering: Reordered state persistence to occur before monitoring setup, ensuring that recovery processes can correctly identify and resume pending withdrawals after a restart.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the reliability of static address withdrawals by implementing automatic retries for notification registrations and robust handling for blockchain reorgs. It also reorders state persistence to ensure that withdrawals can be recovered correctly after a restart. The review feedback identifies several opportunities to harden the implementation, specifically by handling potential channel closures to prevent panics or tight loops during shutdown, and ensuring that the correct transaction hash is cleared from the internal tracking map when Replace-By-Fee (RBF) transactions are confirmed.

Comment thread staticaddr/withdraw/manager.go Outdated
Comment thread staticaddr/withdraw/manager.go Outdated
Comment thread staticaddr/withdraw/manager.go Outdated
Comment thread staticaddr/withdraw/manager.go Outdated
Comment thread staticaddr/withdraw/manager.go Outdated
@kaldun-tech kaldun-tech marked this pull request as ready for review May 1, 2026 21:37
@kaldun-tech kaldun-tech marked this pull request as draft May 1, 2026 21:38
@kaldun-tech kaldun-tech force-pushed the withdraw-retry-confirmation-watching-reorgs branch from a39322a to 692327e Compare May 1, 2026 21:53
@kaldun-tech kaldun-tech marked this pull request as ready for review May 1, 2026 21:58
@kaldun-tech
Copy link
Copy Markdown
Author

@hieblmi Ready for review

@lightninglabs-deploy
Copy link
Copy Markdown

@kaldun-tech, remember to re-request review from reviewers when ready

@kaldun-tech
Copy link
Copy Markdown
Author

@hieblmi @ffranr @alexbosworth @starius Sorry for ping, reminder to review

@kaldun-tech kaldun-tech marked this pull request as draft May 21, 2026 19:53
@kaldun-tech
Copy link
Copy Markdown
Author

Doing additional review and adding tests. Please stand by.

This commit fixes three related issues in the withdrawal manager:

1. No retry when RegisterSpendNtfn/RegisterConfirmationsNtfn fails
2. No handling for reorgs (tx confirmed then reorged out)
3. State not persisted before handleWithdrawal, causing recovery failures

Changes:
- Reorder state persistence to happen BEFORE handleWithdrawal so
  recoverWithdrawals can find deposits in Withdrawing state on restart
- Add retry-on-next-block for registration failures using block
  notifications
- Add reorg detection via lndclient.WithReOrgChan for both spend and
  confirmation notifications
- On reorg after spend detection, return to spend watching (not just
  re-register for confirmations) since the spend tx may be invalidated
- Make handleWithdrawal async/best-effort since tx is already published
- Added appropriate tests in manager_test.go

Fixes issue lightninglabs#1087

Signed-off-by: kaldun-tech <tsmereka@protonmail.com>
@kaldun-tech kaldun-tech force-pushed the withdraw-retry-confirmation-watching-reorgs branch from 692327e to e33e525 Compare May 22, 2026 21:43
@kaldun-tech kaldun-tech marked this pull request as ready for review May 22, 2026 21:43
// Withdrawing), meaning this is the first withdrawal attempt. Start the
// spend monitor goroutine. Fee bumps reuse the existing monitor since
// we ensure the same deposit ensemble is used for republishing.
if allDeposited {
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to rename the allDeposited variable to isFirstWithdrawal to better reflect how it's being used here

Copy link
Copy Markdown
Author

@kaldun-tech kaldun-tech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ready for review. Added inline comments on two areas where I'd appreciate maintainer input.

m.mu.Lock()
delete(m.finalizedWithdrawalTxns, txHash)
delete(m.finalizedWithdrawalTxns, *spentTx.SpenderTxHash)
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a defensive nil check here.

My concern in the RBF (Replace-By-Fee) is the case where when the replacement confirms, the replacement's hash is used to delete here. But the original txHash was added to the map initially. Does it remain as a leak? Should both be cleaned up here or does the map get cleared via another path?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants