BIP drafts: SwiftSync Specification#2152
Conversation
murchandamus
left a comment
There was a problem hiding this comment.
Just a quick first glance, but could you please break your text into shorter lines? That makes it easier to leave review and track what changed between commits. Either 100 or 120 characters per line seems to work well enough.
|
FWIW, I don't mind the unbroken lines and even prefer them. Avoids rejigging line lengths to keep them consistent when updating or having lines with very different lengths. |
danielabrozzoni
left a comment
There was a problem hiding this comment.
I did an initial pass and left some comments. I read the BIPs in the commit order (block undo -> histfile -> swiftsync) and it was pretty easy to follow.
| # Abstract | ||
|
|
||
| Inputs of a Bitcoin block are referenced by the outpoint data structure. This poses a limitation during initial block download (IBD), such that a client must process blocks sequentially to validate the chain history. The SwiftSync protocol allows blocks to be evaluated in arbitrary order, however additional data is required that must be served over the peer-to-peer network. Namely, the creation height, coinbase flag, input script, and amount for each spent coin must be accessible to fully validate a block in a state-less manner. This data cannot be trusted by a peer under usual conditions, however SwiftSync allows a client performing IBD to validate the correctness of this data. | ||
| # Motivation |
There was a problem hiding this comment.
The motivation is a bit repetitive with what's in the abstract, and there are a lot of technical details. I would reshape this a little in order to give a stronger motivation ("IBD is slow because it must be done sequentially, SwiftSync speeds this up by removing this limitation")
| A current limitation of IBD is that it must be done sequentially. This is a result of the height, coinbase flag, input script, and amount of the block inputs being omitted from the data committed to by proof of work in the current block, and, thus, this data cannot be trusted if received over the wire naively. Using the SwiftSync protocol, a client is able to verify the correctness of this data, even if served by a potentially untrusted party. This is a property of the SwiftSync hash aggregate, which commits to the height, coinbase flag, script, and amount when creating and deleting coins. | ||
| # Specification | ||
|
|
||
| In Bitcoin Core, to roll-back the chain state in the event of a block reorganization, the height, coinbase flag, script and amount metadata for each input of a block are stored in a data structure known colloquially as "undo data". This terminology stems from its use to "undo" the effect of a block by repopulating the UTXO set with the coins that were spent by the reorganized block. To remain general in language, this data will be referred as "spent coins." |
There was a problem hiding this comment.
nit: it took me a while to figure out that the coinbase flag is needed in order to check that a transaction is not spending an immature coinbase... It would be nice to have this specified somewhere to avoid confusion, but I'm not sure if it really fits here, so, as you wish
jurraca
left a comment
There was a problem hiding this comment.
some writing nits but overall the concept is clear enough.
|
|
||
| # Abstract | ||
|
|
||
| Inputs of a Bitcoin block are referenced by the outpoint data structure. This poses a limitation during initial block download (IBD), such that a client must process blocks sequentially to validate the chain history. The SwiftSync protocol allows blocks to be evaluated in arbitrary order, however additional data is required that must be served over the peer-to-peer network. Namely, the creation height, coinbase flag, input script, and amount for each spent coin must be accessible to fully validate a block in a state-less manner. This data cannot be trusted by a peer under usual conditions, however SwiftSync allows a client performing IBD to validate the correctness of this data. |
There was a problem hiding this comment.
however additional data is required that must be served over the peer-to-peer network
I would combine this and the next sentence so it's clear what the data is required for at the end of this sentence, e.g. "... in order to validate the blocks, namely the creation height, coinbase flag...".
This data cannot be trusted by a peer under usual conditions
"under usual conditions" doesn't provide much value here, imo it's either trusted or validated, in this case it can't be trusted when served by a peer.
| # Abstract | ||
|
|
||
| Inputs of a Bitcoin block are referenced by the outpoint data structure. This poses a limitation during initial block download (IBD), such that a client must process blocks sequentially to validate the chain history. The SwiftSync protocol allows blocks to be evaluated in arbitrary order, however additional data is required that must be served over the peer-to-peer network. Namely, the creation height, coinbase flag, input script, and amount for each spent coin must be accessible to fully validate a block in a state-less manner. This data cannot be trusted by a peer under usual conditions, however SwiftSync allows a client performing IBD to validate the correctness of this data. | ||
| # Motivation |
There was a problem hiding this comment.
The Abstract and Motivation repeat similar ideas. imo the Abstract part should focus more on what this proposal provides, instead of the reasoning and context that led to it. Could also merge them into one section if you want.
4bb9e9e to
92093e1
Compare
92093e1 to
f4cd99a
Compare
|
Thanks for the review, @danielabrozzoni and @jurraca, as well as the quick turnaround @rustaceanrob. I notice that this pull request is still marked as a Draft PR. Are you still planning significant changes? If your submission is ready for another BIP Editor review, please mark the PR as "ready for review". |
|
I will keep these as a draft as the hintsfile format is subject to change. |
| ## Motivation | ||
|
|
||
| A current limitation of IBD is that it must be done sequentially. This is a result of the height, coinbase flag, input script, and amount of the block inputs being omitted from the data committed to by proof of work in the current block, and, thus, this data cannot be trusted if received over the wire naively. Using the SwiftSync protocol, a client is able to verify the correctness of this data, even if served by a potentially untrusted party. This allows a significant improvement in IBD performance, as block downloads may be done in parallel. | ||
| ## Specification |
There was a problem hiding this comment.
Minor formatting thing, but the document as reads today isn't uniform w.r.t when it decides to add a new line before heading vs not.
| - `[]byte`: arbitrary sequence of bytes with no fixed length | ||
| - `<N bytes>`: byte vector of size N, where N is specified inline (e.g. <32 bytes>) | ||
| - `vector<Foo>`: vector of arbitrary length of elements of type Foo | ||
| - `CompactSize`: encoding of unsigned integers defined in peer-to-peer messages |
There was a problem hiding this comment.
For the sake of having a complete stand alone document, I think CompactSize should be specified briefly here. Otherwise, it assumes that readers are familiar with the naming/semantic conventions of the bitcoind codebase.
| ### Definitions | ||
|
|
||
| - `[]byte`: arbitrary sequence of bytes with no fixed length | ||
| - `<N bytes>`: byte vector of size N, where N is specified inline (e.g. <32 bytes>) |
There was a problem hiding this comment.
What encoding is used to specify the length of N?
| | Amount | 64 bit unsigned integer | Defined above | Satoshi denominated value | | ||
| ### Messages | ||
|
|
||
| #### MSG_GET_SPENT_COINS |
There was a problem hiding this comment.
Is the idea that a peer would issue this request for every block in the chain? If we assume mainnet at height, and a 150 ms round trip time, then a peer would spend nearly 80 hours just downloading this undo data.
You may want to consider a batched variant, similar to the way messages like getheaders works.
|
|
||
| | Field | Value | | ||
| | :----------------- | :---------- | | ||
| | `NODE_BLOCK_UNDO` | `1 << ???` | |
There was a problem hiding this comment.
Rationale should be added for the choice of a new node version over the more common place (as of the past few years) exchange of a sendX message during the version handshake.
IMO a version makes sense here, as it can be used to filter out peers upfront that support sending this undo data over the network.
|
|
||
| #### MSG_SPENT_COINS | ||
|
|
||
| `MSG_SPENT_COINS` defines the data structure for inputs of a block. |
There was a problem hiding this comment.
Probably not really y'all's intended use case, but if you optionally make it possible to include merkle proofs for the set of coins, then this message can be used to obtain a proof that an output was spent in a given block.
There was a problem hiding this comment.
It would actually also be useful for BIP 157+158 peers, as the final version that shipped includes the script spent (instead of the outpoint), which means that if you're using the filters to find a block where a given script has been spent, you need to make some assumptions about what the prev script is for a given transaction.
SwiftSync is a protocol for clients to parallelize initial block download, based on the original writeup.