Skip to content

Conversation

@tnull
Copy link
Collaborator

@tnull tnull commented Aug 19, 2025

Previously we a) added a new internal `Runtime` API that cleans up our
internal logic and b) added tracking for spawned background tasks to be
able to await/abort them on shutdown.

Here we move the tracking into the `Runtime` object, which will allow us
to easily extend the tracking to *any* spawned tasks in the next step.
We now drop the generic `spawn` from our internal `Runtime` API,
ensuring we'd always have to either use
`spawn_cancellable_background_task` or `spawn_background_task`.

tnull added 2 commits August 19, 2025 13:41
Previously we a) added a new internal `Runtime` API that cleans up our
internal logic and b) added tracking for spawned background tasks to be
able to await/abort them on shutdown.

Here we move the tracking into the `Runtime` object, which will allow us
to easily extend the tracking to *any* spawned tasks in the next step.
We now drop the generic `spawn` from our internal `Runtime` API,
ensuring we'd always have to either use
`spawn_cancellable_background_task` or `spawn_background_task`.
@tnull tnull requested a review from TheBlueMatt August 19, 2025 11:42
@ldk-reviews-bot
Copy link

ldk-reviews-bot commented Aug 19, 2025

👋 Thanks for assigning @TheBlueMatt as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

}
}

pub fn spawn<F>(&self, future: F) -> JoinHandle<F::Output>
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, considered doing the same for spawn_blocking, but a) blocking tasks can't be cancelled anyways and b) it is currently only used by the electrum client where each instance is wrapped in a tokio::time::timeout. It would be weird to wait on any of the blocking tasks during shutdown, if during operation we decided to move on/handled the timeout error already.

Any opinions here, @TheBlueMatt ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion, other than that we should push hard to remove the blocking calls there (looks like at least the tx_sync.sync call will be non-blocking in LDK 0.2, and we should try the async version of the electrum client?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion, other than that we should push hard to remove the blocking calls there (looks like at least the tx_sync.sync call will be non-blocking in LDK 0.2

Not sure where you did see that? I don't think anything changed, we'll have esplora-blocking, esplora-async, and the blocking electrum client

and we should try the async version of the electrum client?

There currently is no async version of the electrum client though? Evan recently put up an experimental crate (https://crates.io/crates/electrum_streaming_client), but I don't expect bdk_electrum to use it too soon. When it will/offers that, I agree that we should also support it / switch to it in lightning-transaction-sync.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure where you did see that? I don't think anything changed, we'll have esplora-blocking, esplora-async, and the blocking electrum client

Oh, sorry, I misread that.

There currently is no async version of the electrum client though? Evan recently put up an experimental crate (https://crates.io/crates/electrum_streaming_client), but I don't expect bdk_electrum to use it too soon. When it will/offers that, I agree that we should also support it / switch to it in lightning-transaction-sync.

Ah, somehow I thought there was. That's....frustrating.

Sadly we can't just leave this be entirely. If we're connecting blocks Things might happen to make our ChannelManager need re-persistence, which means we shouldn't exit the LDK background processor until the sync logic is done with its work. cf #620

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly we can't just leave this be entirely. If we're connecting blocks Things might happen to make our ChannelManager need re-persistence, which means we shouldn't exit the LDK background processor until the sync logic is done with its work. cf #620

We aren't connecting blocks though? We can't abort the blocking tasks, so all we could do is to wait for them, but then again, this would leave us open to any kinds of stalling etc, which is why we employ the timeouts in the first place. FWIW, we also have timeouts on the shutdown itself, because we also can't just get stuck forever if some task decides to never return for some reason.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, my understanding (based on the name) of calls like tx_sync.sync(confirmables) and bdk_electrum_client.sync(...) was that they were actually fetching and then connecting blocks/transaction data to the underlying logic (in the first case LDK, in the second BDK). I get that we can't really just block on them finishing their work forever, but this does seem like a somewhat urgent issue.

Copy link
Collaborator Author

@tnull tnull Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, my understanding (based on the name) of calls like tx_sync.sync(confirmables) and bdk_electrum_client.sync(...) was that they were actually fetching and then connecting blocks/transaction data to the underlying logic (in the first case LDK, in the second BDK). I get that we can't really just block on them finishing their work forever, but this does seem like a somewhat urgent issue.

Not quite, that's basically true for LDK, but BDK by now decoupled retrieving the data from applying it to the wallet. So bdk_electrum_client.sync/bdk_electrum_client.full_scan just do the retrieval, and the results is applied separately afterwards. Meaning, if the higher-level task is cancelled/awaited, the spawn_blocking tasks would just eventually end/timeout but nothing would be applied.

The story is indeed a bit different for LDK as tx_sync.sync which directly connects the updates to the confirmables as part of the (blocking) task. I wonder if we should refactor lightning-transaction-sync to allow for a similar decoupling of users. That would probably require lightningdevkit/rust-lightning#3867 and then sync wouldn't directly apply the updates to the confirmables, but rather return a Vec<BlockUpdate> or similar, moving the responsibility to apply these updates to all Confirm implementations to the user.

FWIW, that would also be nice for the (async) callgraph, as otherwise the tokio task that is responsible for syncing would necessarily call back into our Confirm implementation, possibly also driving persistence, resulting broadcasts, etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The story is indeed a bit different for LDK as tx_sync.sync which directly connects the updates to the confirmables as part of the (blocking) task. I wonder if we should refactor lightning-transaction-sync to allow for a similar decoupling of users. That would probably require lightningdevkit/rust-lightning#3867 and then sync wouldn't directly apply the updates to the confirmables, but rather return a Vec or similar, moving the responsibility to apply these updates to all Confirm implementations to the user.

That might be one way to address it, yea, but of course the "right" way is for the fetching of resources to actually be async :)

FWIW, that would also be nice for the (async) callgraph, as otherwise the tokio task that is responsible for syncing would necessarily call back into our Confirm implementation, possibly also driving persistence, resulting broadcasts, etc.

Afaiu as of LDK 0.2 all the persistence (hopefully the ChannelMonitors included) will be spawn-and-don't-block, so it shouldn't be an issue :). That's already the case for transaction broadcasting, so I'm not sure why that's a concern here.

@tnull
Copy link
Collaborator Author

tnull commented Aug 19, 2025

Grrr, CI has really become unusable due to #618 right now :(

Copy link
Contributor

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Diff itself LGTM, but I also noticed #620.

@tnull
Copy link
Collaborator Author

tnull commented Aug 19, 2025

Landing this, will open a follow-up PR for #620 tomorrow.

@tnull tnull merged commit 1c02114 into lightningdevkit:main Aug 19, 2025
9 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants