Defer ChainMonitor updates and persistence to flush() #4351

joostjager · 2026-01-27T11:34:09Z

Summary

Modify ChainMonitor internally to queue watch_channel and update_channel operations, returning InProgress until flush() is called. This enables persistence of monitor updates after ChannelManager persistence, ensuring correct ordering where the ChannelManager state is never ahead of the monitor state on restart. The new behavior is opt-in via a deferred switch.

Key changes:

ChainMonitor gains a deferred switch to enable the new queuing behavior
When enabled, monitor operations are queued internally and return InProgress
Calling flush() applies pending operations and persists monitors
Background processor updated to capture pending count before ChannelManager persistence, then flush after persistence completes

Performance Impact

Multi-channel, multi-node load testing (using ldk-server chaos branch) shows no measurable throughput difference between deferred and direct persistence modes.

This is likely because forwarding and payment processing are already effectively single-threaded: the background processor batches all forwards for the entire node in a single pass, so the deferral overhead doesn't add any meaningful bottleneck to an already serialized path.

For high-latency storage (e.g., remote databases), there is also currently no significant impact because channel manager persistence already blocks event handling in the background processor loop (test). If the loop were parallelized to process events concurrently with persistence, deferred writing would become comparatively slower since it moves the channel manager round trip into the critical path. However, deferred writing would also benefit from loop parallelization, and could be further optimized by batching the monitor and manager writes into a single round trip.

Alternative Designs Considered

Several approaches were explored to solve the monitor/manager persistence ordering problem:

1. Queue at KVStore level (#4310)

Introduces a QueuedKVStoreSync wrapper that queues all writes in memory, committing them in a single batch at chokepoints where data leaves the system (get_and_clear_pending_msg_events, get_and_clear_pending_events). This approach aims for true atomic multi-key writes but requires KVStore backends that support transactions (e.g., SQLite); filesystem backends cannot achieve full atomicity.

Trade-offs: Most general solution but requires changes to persistence boundaries and cannot fully close the desync gap with filesystem storage.

2. Queue at Persister level (#4317)

Updates MonitorUpdatingPersister to queue persist operations in memory, with actual writes happening on flush(). Adds flush() to the Persist trait and ChainMonitor.

Trade-offs: Only fixes the issue for MonitorUpdatingPersister; custom Persist implementations remain vulnerable to the race condition.

3. Queue at ChainMonitor wrapper level (#4345)

Introduces DeferredChainMonitor, a wrapper around ChainMonitor that implements the queue in a separate wrapper layer. All ChainMonitor traits (Listen, Confirm, EventsProvider, etc.) are passed through, allowing drop-in replacement.

Trade-offs: Requires re-implementing all trait pass-throughs on the wrapper. Keeps the core ChainMonitor unchanged but adds an external layer of indirection.

ldk-reviews-bot · 2026-01-27T11:34:12Z

👋 Hi! I see this is a draft PR.
I'll wait to assign reviewers until you mark it as ready for review.
Just convert it out of draft status when you're ready for review!

joostjager · 2026-01-27T11:39:11Z

Closing this PR as #4345 seems to be the easiest way to go

joostjager · 2026-02-09T14:45:30Z

The single commit was split into three: extracting internal methods, adding a deferred toggle, and implementing the deferral and flushing logic. flush() now delegates to the extracted internal methods rather than reimplementing persist/insert logic inline. Deferred mode is opt-in via a deferred bool rather than always-on. Test infrastructure was expanded with deferred-mode helpers and dedicated unit tests.

Pure refactor: move the bodies of Watch::watch_channel and Watch::update_channel into methods on ChainMonitor, and have the Watch trait methods delegate to them. This prepares for adding deferred mode where the Watch methods will conditionally queue operations instead of executing them immediately. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add a `deferred` parameter to `ChainMonitor::new` and `ChainMonitor::new_async_beta`. When set to true, the Watch trait methods (watch_channel and update_channel) will unimplemented!() for now. All existing callers pass false to preserve current behavior. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace the unimplemented!() stubs with a full deferred write implementation. When ChainMonitor has deferred=true, Watch trait operations queue PendingMonitorOp entries instead of executing immediately. A new flush() method drains the queue and forwards operations to the internal watch/update methods, calling channel_monitor_updated on Completed status. The BackgroundProcessor is updated to capture pending_operation_count before persisting the ChannelManager, then flush that many writes afterward - ensuring monitor writes happen in the correct order relative to manager persistence. Key changes: - Add PendingMonitorOp enum and pending_ops queue to ChainMonitor - Implement flush() and pending_operation_count() public methods - Integrate flush calls in BackgroundProcessor (both sync and async) - Add TestChainMonitor::new_deferred, flush helpers, and auto-flush in release_pending_monitor_events for test compatibility - Add create_node_cfgs_deferred for deferred-mode test networks - Add unit tests for queue/flush mechanics and full payment flow Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

joostjager mentioned this pull request Jan 27, 2026

Defer ChainMonitor updates and persistence to flush (wrapper approach) #4345

Closed

joostjager closed this Jan 27, 2026

joostjager reopened this Feb 9, 2026

joostjager force-pushed the chain-mon-internal-deferred-writes branch from 1f5cef4 to 30d05ca Compare February 9, 2026 14:45

joostjager force-pushed the chain-mon-internal-deferred-writes branch 3 times, most recently from d56b419 to 7eb382c Compare February 9, 2026 16:35

joostjager and others added 2 commits February 10, 2026 14:49

joostjager force-pushed the chain-mon-internal-deferred-writes branch from 7eb382c to 91faa0f Compare February 10, 2026 13:53

joostjager force-pushed the chain-mon-internal-deferred-writes branch from 91faa0f to e7cf996 Compare February 10, 2026 13:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Defer ChainMonitor updates and persistence to flush() #4351

Defer ChainMonitor updates and persistence to flush() #4351

joostjager commented Jan 27, 2026 •

edited

Loading

Uh oh!

ldk-reviews-bot commented Jan 27, 2026

Uh oh!

joostjager commented Jan 27, 2026

Uh oh!

joostjager commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Defer ChainMonitor updates and persistence to flush() #4351

Are you sure you want to change the base?

Defer ChainMonitor updates and persistence to flush() #4351

Conversation

joostjager commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Performance Impact

Alternative Designs Considered

1. Queue at KVStore level (#4310)

2. Queue at Persister level (#4317)

3. Queue at ChainMonitor wrapper level (#4345)

Uh oh!

ldk-reviews-bot commented Jan 27, 2026

Uh oh!

joostjager commented Jan 27, 2026

Uh oh!

joostjager commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

joostjager commented Jan 27, 2026 •

edited

Loading