-
Notifications
You must be signed in to change notification settings - Fork 428
NetworkGraph: Update chan/node estimation numbers, determine pre-allocation dynamically on read
#4306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NetworkGraph: Update chan/node estimation numbers, determine pre-allocation dynamically on read
#4306
Conversation
|
👋 Thanks for assigning @TheBlueMatt as a reviewer! |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4306 +/- ##
==========================================
+ Coverage 86.59% 86.62% +0.03%
==========================================
Files 158 158
Lines 102408 102730 +322
Branches 102408 102730 +322
==========================================
+ Hits 88678 88989 +311
- Misses 11309 11321 +12
+ Partials 2421 2420 -1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
lightning/src/routing/gossip.rs
Outdated
| /// | ||
| /// To improve efficiency, this will pre-allocate memory for `node_count_estimate` nodes and | ||
| /// `channel_count_estimate` channels. | ||
| pub fn from_node_and_channel_count_estimates( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty skeptical of exposing this. If you don't want a network graph, don't build one. If you want a network graph, we should allocate for a network graph. How is a downstream dev better positioned to provide estimates here than we are?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, from my point of view it is a bit odd to pre-allocate a lot of memory based on static estimations that will become stale over time. Plus, we currently make no distinction based on Network here, so we will always allocate that much memory even for small Regtest or Signet environments with only a hand full of nodes (e.g., in tests). If you're skeptical, maybe we can drop the extra constructor, but leave the dynamic allocation on read, and only apply the estimates for Network::Bitcoin?
Of course, I have to admit that I first had planned to use that constructor for the minimal-mode Node over at LDK Node, but you're right, for that application we probably need to completely change up our types there and simply rip out anything related to the Router/NetworkGraph entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, from my point of view it is a bit odd to pre-allocate a lot of memory based on static estimations that will become stale over time.
Sure, but do we really expect downstream devs to update their estimates more often than us?
Plus, we currently make no distinction based on Network here, so we will always allocate that much memory even for small Regtest or Signet environments with only a hand full of nodes (e.g., in tests).
Yea, we should definitely use Network to disable the pre-allocation.
If you're skeptical, maybe we can drop the extra constructor, but leave the dynamic allocation on read, and only apply the estimates for Network::Bitcoin?
Yea, makes sense. We could still use the constants on mainnet loads, even, to ensure we pre-allocate even if loading with an empty (or partially-synced) graph but I guess it doesn't matter that much either way.
Of course, I have to admit that I first had planned to use that constructor for the minimal-mode Node over at LDK Node, but you're right, for that application we probably need to completely change up our types there and simply rip out anything related to the Router/NetworkGraph entirely.
Yea, I mean it would be simpler for LDK Node, sure, but it definitely feels like the wrong way :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, will drop the first commit then, and add one that disables the estimates on non-mainnet networks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, let me know if I can squash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, please.
|
🔔 1st Reminder Hey @wpaulino! This PR has been waiting for your review. |
9138bb2 to
070002c
Compare
|
Squashed without further changes. |
lightning/src/routing/gossip.rs
Outdated
| const CHAN_COUNT_ESTIMATE: usize = 50_000; | ||
| /// In Jan, 2026 there were about 13K nodes | ||
| /// | ||
| /// We over-allocate by a bit because 15% more is better than the double we get if we're slightly | ||
| /// too low. | ||
| const NODE_COUNT_ESTIMATE: usize = 15_000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, oops, these are actives. My node (restarted less than a week ago, i believe, so hasn't pruned channels where one side has been disabled for a week) shows network_nodes: 17013, network_channels: 54264. Thus, if we use 50k/15k what we'll actually end up with is 100k/30k as the allocation (I assume vec doubles? maybe not for large allocations?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, can bump the numbers a bit if you prefer:
> git diff-tree -U2 508d806b0 39ca7cb1d
diff --git a/lightning/src/routing/gossip.rs b/lightning/src/routing/gossip.rs
index 9ebefceb7..534bebe76 100644
--- a/lightning/src/routing/gossip.rs
+++ b/lightning/src/routing/gossip.rs
@@ -1691,8 +1691,6 @@ where
let channels_map_capacity = (channels_count as u128 * 115 / 100)
.try_into()
+ .map(|v: usize| v.min(MAX_CHAN_COUNT_LIMIT))
.map_err(|_| DecodeError::InvalidValue)?;
- if channels_map_capacity > MAX_CHAN_COUNT_LIMIT {
- return Err(DecodeError::InvalidValue);
- }
let mut channels = IndexedMap::with_capacity(channels_map_capacity);
for _ in 0..channels_count {
@@ -1708,9 +1706,8 @@ where
}
// Pre-allocate 115% of the known channel count to avoid unnecessary reallocations.
- let nodes_map_capacity: usize =
- (nodes_count as u128 * 115 / 100).try_into().map_err(|_| DecodeError::InvalidValue)?;
- if nodes_map_capacity > MAX_NODE_COUNT_LIMIT {
- return Err(DecodeError::InvalidValue);
- }
+ let nodes_map_capacity: usize = (nodes_count as u128 * 115 / 100)
+ .try_into()
+ .map(|v: usize| v.min(MAX_NODE_COUNT_LIMIT))
+ .map_err(|_| DecodeError::InvalidValue)?;
let mut nodes = IndexedMap::with_capacity(nodes_map_capacity);
for i in 0..nodes_count {|
👋 The first review has been submitted! Do you think this PR is ready for a second reviewer? If so, click here to assign a second reviewer. |
lightning/src/routing/gossip.rs
Outdated
| let channels_map_capacity = (channels_count as u128 * 115 / 100) | ||
| .try_into() | ||
| .map_err(|_| DecodeError::InvalidValue)?; | ||
| if channels_map_capacity > MAX_CHAN_COUNT_LIMIT { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe someone is doing a 1M channels test? Let's just limit the pre-allocation rather than failing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I'm following? If the concern is that we read a network graph that is somehow so large and we'd allocate 'infinite memory', we'd want to fail here?
If we don't think that we'd ever run into that we don't need a hard limit at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think 10M nodes and 100M channels is "infinite memory". eg a while back rusty did his "million channels" test to see if he could run a network with 1M channels and figure out what the bottlenecks were. I don't think its crazy to think someone might do a 100M-and-1-channel-test :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but I'm then not sure I understand the main concern in the first place. Are we considering a case where we read a maliciously crafted NetworkGraph dump? Or when else would we ever hit the 'infinite memory' case you requested the upper-bound for above #4306 (comment) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The concern is if we read a corrupt NetworkGraph and the value here returns 2^64 then we'll immediately crash trying to allocate, rather than failing to read with an error (when we run out of bytes to read because the file we're reading isn't 2^64 bytes).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but then returning an error if it surpasses the limits is the right thing? Or are you just saying we should raise the limits further? If so, please suggest some numbers, happy to adjust.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In most of the rest of the crate we avoid picking failure constants here by just allocating min(read-count, reasonable-max). IMO we should do the same here. Its entirely possible that a value of 100M channels is a corrupt graph and we should avoid pre-allocating it. Its also possible that someone is doing some benchmarking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, not sure if I entirely follow the logic, but no strong opinion here. Force-pushed the min approach:
> git diff-tree -U2 508d806b0 32c4362a4
diff --git a/lightning/src/routing/gossip.rs b/lightning/src/routing/gossip.rs
index 9ebefceb7..95dd5a7aa 100644
--- a/lightning/src/routing/gossip.rs
+++ b/lightning/src/routing/gossip.rs
@@ -1691,8 +1691,6 @@ where
let channels_map_capacity = (channels_count as u128 * 115 / 100)
.try_into()
- .map_err(|_| DecodeError::InvalidValue)?;
- if channels_map_capacity > MAX_CHAN_COUNT_LIMIT {
- return Err(DecodeError::InvalidValue);
- }
+ .map_err(|_| DecodeError::InvalidValue)?
+ .min(MAX_CHAN_COUNT_LIMIT);
let mut channels = IndexedMap::with_capacity(channels_map_capacity);
for _ in 0..channels_count {
@@ -1708,9 +1706,8 @@ where
}
// Pre-allocate 115% of the known channel count to avoid unnecessary reallocations.
- let nodes_map_capacity: usize =
- (nodes_count as u128 * 115 / 100).try_into().map_err(|_| DecodeError::InvalidValue)?;
- if nodes_map_capacity > MAX_NODE_COUNT_LIMIT {
- return Err(DecodeError::InvalidValue);
- }
+ let nodes_map_capacity: usize = (nodes_count as u128 * 115 / 100)
+ .try_into()
+ .map_err(|_| DecodeError::InvalidValue)?
+ .min(MAX_NODE_COUNT_LIMIT);
let mut nodes = IndexedMap::with_capacity(nodes_map_capacity);
for i in 0..nodes_count {070002c to
508d806
Compare
05a4286 to
32c4362
Compare
…eading When reading a persisted network graph, we previously pre-allocated our default node/channels estimate count for the respective `IndexedMap` capacities. However, this might unnecessarily allocate memory on reading, for example if we have an (almost) empty network graph for one reason or another. As we have the actual counts of persisted nodes and channels available, we here simply opt to allocate these numbers (plus 15%). This will also ensure that our pre-allocations will keep up-to-date over time as the network grows or shrinks.
Previously, we'd always pre-allocate memory for the node and channel maps based on mainnet numbers, even if we're on another network like `Regest`. Here, we only apply the estimates if we're actually on `Network::Bitcoin`, which should reduce the `NetworkGraph`'s memory footprint considerably in tests.
32c4362 to
39ca7cb
Compare
NetworkGraph: Allow to construct with custom count estimates, update numbers, determine pre-allocation dynamically on readNetworkGraph: Update chan/node estimation numbers, determine pre-allocation dynamically on read
TheBlueMatt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
Uh oh!
There was an error while loading. Please reload this page.