Refactor task tracking to ensure we await/abort any spawned tasks #619

tnull · 2025-08-19T11:42:39Z

Previously we a) added a new internal `Runtime` API that cleans up our
internal logic and b) added tracking for spawned background tasks to be
able to await/abort them on shutdown.

Here we move the tracking into the `Runtime` object, which will allow us
to easily extend the tracking to *any* spawned tasks in the next step.

We now drop the generic `spawn` from our internal `Runtime` API,
ensuring we'd always have to either use
`spawn_cancellable_background_task` or `spawn_background_task`.

Previously we a) added a new internal `Runtime` API that cleans up our internal logic and b) added tracking for spawned background tasks to be able to await/abort them on shutdown. Here we move the tracking into the `Runtime` object, which will allow us to easily extend the tracking to *any* spawned tasks in the next step.

We now drop the generic `spawn` from our internal `Runtime` API, ensuring we'd always have to either use `spawn_cancellable_background_task` or `spawn_background_task`.

ldk-reviews-bot · 2025-08-19T11:42:41Z

👋 Thanks for assigning @TheBlueMatt as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

tnull · 2025-08-19T11:46:09Z

src/runtime.rs

 		}
 	}

-	pub fn spawn<F>(&self, future: F) -> JoinHandle<F::Output>


FWIW, considered doing the same for spawn_blocking, but a) blocking tasks can't be cancelled anyways and b) it is currently only used by the electrum client where each instance is wrapped in a tokio::time::timeout. It would be weird to wait on any of the blocking tasks during shutdown, if during operation we decided to move on/handled the timeout error already.

Any opinions here, @TheBlueMatt ?

I don't have a strong opinion, other than that we should push hard to remove the blocking calls there (looks like at least the tx_sync.sync call will be non-blocking in LDK 0.2, and we should try the async version of the electrum client?)

I don't have a strong opinion, other than that we should push hard to remove the blocking calls there (looks like at least the tx_sync.sync call will be non-blocking in LDK 0.2

Not sure where you did see that? I don't think anything changed, we'll have esplora-blocking, esplora-async, and the blocking electrum client

and we should try the async version of the electrum client?

There currently is no async version of the electrum client though? Evan recently put up an experimental crate (https://crates.io/crates/electrum_streaming_client), but I don't expect bdk_electrum to use it too soon. When it will/offers that, I agree that we should also support it / switch to it in lightning-transaction-sync.

Not sure where you did see that? I don't think anything changed, we'll have esplora-blocking, esplora-async, and the blocking electrum client

Oh, sorry, I misread that.

There currently is no async version of the electrum client though? Evan recently put up an experimental crate (https://crates.io/crates/electrum_streaming_client), but I don't expect bdk_electrum to use it too soon. When it will/offers that, I agree that we should also support it / switch to it in lightning-transaction-sync.

Ah, somehow I thought there was. That's....frustrating.

Sadly we can't just leave this be entirely. If we're connecting blocks Things might happen to make our ChannelManager need re-persistence, which means we shouldn't exit the LDK background processor until the sync logic is done with its work. cf #620

Sadly we can't just leave this be entirely. If we're connecting blocks Things might happen to make our ChannelManager need re-persistence, which means we shouldn't exit the LDK background processor until the sync logic is done with its work. cf #620

We aren't connecting blocks though? We can't abort the blocking tasks, so all we could do is to wait for them, but then again, this would leave us open to any kinds of stalling etc, which is why we employ the timeouts in the first place. FWIW, we also have timeouts on the shutdown itself, because we also can't just get stuck forever if some task decides to never return for some reason.

Hmm, my understanding (based on the name) of calls like tx_sync.sync(confirmables) and bdk_electrum_client.sync(...) was that they were actually fetching and then connecting blocks/transaction data to the underlying logic (in the first case LDK, in the second BDK). I get that we can't really just block on them finishing their work forever, but this does seem like a somewhat urgent issue.

Hmm, my understanding (based on the name) of calls like tx_sync.sync(confirmables) and bdk_electrum_client.sync(...) was that they were actually fetching and then connecting blocks/transaction data to the underlying logic (in the first case LDK, in the second BDK). I get that we can't really just block on them finishing their work forever, but this does seem like a somewhat urgent issue.

Not quite, that's basically true for LDK, but BDK by now decoupled retrieving the data from applying it to the wallet. So bdk_electrum_client.sync/bdk_electrum_client.full_scan just do the retrieval, and the results is applied separately afterwards. Meaning, if the higher-level task is cancelled/awaited, the spawn_blocking tasks would just eventually end/timeout but nothing would be applied.

The story is indeed a bit different for LDK as tx_sync.sync which directly connects the updates to the confirmables as part of the (blocking) task. I wonder if we should refactor lightning-transaction-sync to allow for a similar decoupling of users. That would probably require lightningdevkit/rust-lightning#3867 and then sync wouldn't directly apply the updates to the confirmables, but rather return a Vec<BlockUpdate> or similar, moving the responsibility to apply these updates to all Confirm implementations to the user.

FWIW, that would also be nice for the (async) callgraph, as otherwise the tokio task that is responsible for syncing would necessarily call back into our Confirm implementation, possibly also driving persistence, resulting broadcasts, etc.

The story is indeed a bit different for LDK as tx_sync.sync which directly connects the updates to the confirmables as part of the (blocking) task. I wonder if we should refactor lightning-transaction-sync to allow for a similar decoupling of users. That would probably require lightningdevkit/rust-lightning#3867 and then sync wouldn't directly apply the updates to the confirmables, but rather return a Vec or similar, moving the responsibility to apply these updates to all Confirm implementations to the user.

That might be one way to address it, yea, but of course the "right" way is for the fetching of resources to actually be async :)

FWIW, that would also be nice for the (async) callgraph, as otherwise the tokio task that is responsible for syncing would necessarily call back into our Confirm implementation, possibly also driving persistence, resulting broadcasts, etc.

Afaiu as of LDK 0.2 all the persistence (hopefully the ChannelMonitors included) will be spawn-and-don't-block, so it shouldn't be an issue :). That's already the case for transaction broadcasting, so I'm not sure why that's a concern here.

tnull · 2025-08-19T12:06:00Z

Grrr, CI has really become unusable due to #618 right now :(

TheBlueMatt

Diff itself LGTM, but I also noticed #620.

tnull · 2025-08-19T16:34:23Z

Landing this, will open a follow-up PR for #620 tomorrow.

tnull added 2 commits August 19, 2025 13:41

Drop Runtime::spawn in favor of spawn_cancellable_background_task

90a4fe1

We now drop the generic `spawn` from our internal `Runtime` API, ensuring we'd always have to either use `spawn_cancellable_background_task` or `spawn_background_task`.

tnull requested a review from TheBlueMatt August 19, 2025 11:42

tnull mentioned this pull request Aug 19, 2025

Introduce Runtime object allowng to detect outer runtime context #543

Merged

tnull commented Aug 19, 2025

View reviewed changes

TheBlueMatt mentioned this pull request Aug 19, 2025

LDK BP is stopped before we disconnect peers #620

Closed

TheBlueMatt approved these changes Aug 19, 2025

View reviewed changes

tnull merged commit 1c02114 into lightningdevkit:main Aug 19, 2025
9 of 15 checks passed

Refactor task tracking to ensure we await/abort any spawned tasks #619

Refactor task tracking to ensure we await/abort any spawned tasks #619

Uh oh!

Conversation

tnull commented Aug 19, 2025

Uh oh!

ldk-reviews-bot commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tnull Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

TheBlueMatt Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

tnull Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

TheBlueMatt Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

tnull Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

TheBlueMatt Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

tnull Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TheBlueMatt Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

tnull commented Aug 19, 2025

Uh oh!

TheBlueMatt left a comment

Choose a reason for hiding this comment

Uh oh!

tnull commented Aug 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ldk-reviews-bot commented Aug 19, 2025 •

edited

Loading

tnull Aug 21, 2025 •

edited

Loading