Skip to content
48 changes: 36 additions & 12 deletions CIPs/cip-124.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,10 @@ discussions-to: https://forum.ceramic.network/t/cip-124-recon-tip-synchronizatio
status: Draft
category: Networking
created: 2023-01-18
edited: 2023-06-23
edited: 2023-08-02
---


## Simple Summary

<!--Provide a simplified and layman-accessible explanation of the CIP.-->
Expand All @@ -24,9 +26,10 @@ Stream sets are bundles of streams that can be gossiped about as a group or in s
<!--Motivation is critical for CIPs that want to change the Ceramic protocol. It should clearly explain why the existing protocol specification is inadequate to address the problem that the CIP solves. CIP submissions without sufficient motivation may be rejected outright.-->
Currently nodes broadcast updates to streams to every node in the network using a single libp2p pubsub topic. This incurs a lot of work on all nodes to process messages that they don’t necessarily care about. It also means that the throughput of the network is limited by the bandwidth, leading either to prioritizing high bandwidth nodes or greatly limiting the network throughput to support low bandwidth nodes. Furthermore, if a node missed the broadcast, it would not detect the missing stream events unless it hears a later update or uses some out of band synchronization protocol like "historical data sync" in ComposeDB that scans the Ethereum blockchain for anchor transactions.

Recon aims to provide low to no overhead for nodes with no overlap in interest, while retaining a high probability of getting the **latest** events from a stream shortly after any node has the events, without any need for remote connections at query time. By ceasing to publish updates in the pubsub channel and instead organizing them into a stream set, nodes interested in those streams can synchronize with each other without putting load on uninterested nodes in the network. A secondary goal of stream sets is to give a structure for sharding a stream set across multiple nodes. By supporting the ability to synchronize only a sub-range of the stream set, the burden of storing, indexing, and retrieving streams can be sharded among nodes.
Recon aims to provide low to no overhead for nodes with no overlap in interest, while retaining a high probability of getting the **latest** events from a stream shortly after any node has the events, without any need for remote connections at query time. By ceasing to publish updates in the pubsub channel and instead organizing them into a stream set, nodes interested in those streams can synchronize with each other without putting load on uninterested nodes in the network. A secondary goal of stream sets is to give a structure for sharding a stream set across multiple nodes. By supporting the ability to synchronize only a sub-range of the stream set, the burden of storing, indexing, and retrieving streams can be sharded among nodes.

Finally, nodes also need a way to find other nodes interested in the stream set or sub-range, so that they can synchronize with them. Recon relies on nodes gossiping their interest to peers, as well as keeping a list of their peers' interest. This way nodes that are in sync, or nearly in sync, stay in sync with very little bandwidth. Nodes can also avoid sending stream event announcements to nodes that have no interest in the stream ranges.

Finally, nodes also need a way to find other nodes interested in the stream set or sub-range, so that they can synchronize with them. Recon relies on nodes gossiping their interest to peers, as well as keeping a list of their peers' interest. This way nodes that are in sync, or nearly in sync, stay in sync with very little bandwidth. Nodes can also avoid sending stream event announcements to nodes that have no interest in the stream ranges.

## Specification
<!--The technical specification should describe the syntax and semantics of any new feature.-->
Expand All @@ -51,11 +54,11 @@ concatBytes(
varint(0xce), // streamid varint
varint(0x05), // cip-124 EventID varint
varint(network_id), // network_id varint
last8bytes(sha256(sort_value)), // separator [u8; 8]
last8bytes(sha256(sort_key + "|" + sort_value)), // separator [u8; 8]
last8bytes(sha256(controller)), // controller [u8; 8]
last4bytes(init_event_cid_bytes), // StreamID [u8; 4]
cbor(event_height), // event_height cbor unsigned int
event_cid_bytes, // [u8]
event_cid_bytes, // [u8] a CID or the (0x00 or 0xFF) byte to indicate a fencepost
)
```

Expand All @@ -67,9 +70,28 @@ Where:
* `controller` is the controller DID of the stream this event belongs to
* `init_event_cid_bytes` is the CID of the first Event of the this stream.
* `event_height` is the "height" of the event InitEvent. For InitEvents this value is `0` else `prev.event_height + 1`.
* `event_cid_bytes` the CID of the event itself
* `event_cid_bytes` the CID of the event itself or the (0x00 or 0xFF) byte for a fencepost as it doesn't reference an event.
* `last8bytes` and `last4bytes` takes the last N bytes of the input and prepends with zeros if the input is shorter

Event height [CBOR unsigned integer](https://www.rfc-editor.org/rfc/rfc8949.html#section-3.1-2.1)
* 0 - 23
* 0xXX
* the literal byte
* 24 - 255
* 0x18XX
* the 24 byte then the u8
* 256 - 65,535
* 0x19XXXX
* the 25 byte then the u16
* 65,536 - 4,294,967,295
* 0x1aXXXXXXXX
* the 26 byte then the u32
* 4,294,967,296 - 18,446,744,073,709,551,615
* 0x1bXXXXXXXXXXXXXXXX
* the 27 byte then the u64

When decoding if you reach an invalid value stop decoding and return a None value. This is not an EventID it is a fencepost.

### Recon Message

The Recon protocol uses a binary string as a message for communication. This message is constructed in the following way,
Expand All @@ -82,21 +104,21 @@ Every recon message starts and ends with an eventId and in between every eventId

### Stream Set Ranges

With the definition of eventIds above we get an absolute ordering of events. We can now define subsets of the total range of all eventIds by defining a start and a stop eventId.
With the definition of eventIds above we get an absolute ordering of events. We can now define subsets of the total range of all eventIds by defining a start and a stop eventId.

For example, to construct the range of all streams defined by the *Model* `kjzl6hvfrbw6c82mkud4qs38zl4hd03ifoyg2ksvfjkhuxebfzh3ef89vwvtvrr`, we would construct the start and stop eventIds as follows:

```js
start = eventId(
network_id = 0x00, // mainnet
sort_value = last8Bytes(sha256(kjzl6hvfrbw6c82mkud4qs38zl4hd03ifoyg2ksvfjkhuxebfzh3ef89vwvtvrr)),
sort_value = last8Bytes(sha256(model|kjzl6hvfrbw6c82mkud4qs38zl4hd03ifoyg2ksvfjkhuxebfzh3ef89vwvtvrr)),
controller = last8Bytes(repeat8(0x00)), // stream controller DID
init_event = last4Bytes(repeat4(0x00)) // streamid
)

stop = eventId(
network_id = 0x00, // mainnet
sort_value = last8Bytes(kjzl6hvfrbw6c82mkud4qs38zl4hd03ifoyg2ksvfjkhuxebfzh3ef89vwvtvrr),
sort_value = last8Bytes(sha256(model|kjzl6hvfrbw6c82mkud4qs38zl4hd03ifoyg2ksvfjkhuxebfzh3ef89vwvtvrr)),
controller = last8Bytes(repeat8(0xff)), // stream controller DID
init_event = last4Bytes(repeat4(0xff)) // streamid
)
Expand All @@ -109,14 +131,14 @@ If you want to subscribe only to a specific stream within a *Model* you can use
```js
start = eventId(
network_id = 0x00, // mainnet
sort_value = last8Bytes(sha256(kjzl6hvfrbw6c82mkud4qs38zl4hd03ifoyg2ksvfjkhuxebfzh3ef89vwvtvrr)),
sort_value = last8Bytes(sha256(model|kjzl6hvfrbw6c82mkud4qs38zl4hd03ifoyg2ksvfjkhuxebfzh3ef89vwvtvrr)),
controller = last8Bytes(sha256(stream-controller-did)), // stream controller DID
init_event = last4Bytes(repeat4(init-event-cid)) // streamid
)

end = eventId(
network_id = 0x00, // mainnet
sort_value = last8Bytes(sha256(kjzl6hvfrbw6c82mkud4qs38zl4hd03ifoyg2ksvfjkhuxebfzh3ef89vwvtvrr)),
sort_value = last8Bytes(sha256(model|kjzl6hvfrbw6c82mkud4qs38zl4hd03ifoyg2ksvfjkhuxebfzh3ef89vwvtvrr)),
controller = last8Bytes(sha256(stream-controller-did)), // stream controller DID
init_event = last4Bytes(repeat4(init-event-cid)) + 1 // streamid
)
Expand Down Expand Up @@ -235,6 +257,7 @@ eventId = concatBytes(
)
```


## Rationale
<!--The rationale fleshes out the specification by describing what motivated the design and why particular design decisions were made. It should describe alternate designs that were considered and related work, e.g. how the feature is supported in other languages. The rationale may also provide evidence of consensus within the community, and should discuss important objections or concerns raised during discussion.-->

Expand Down Expand Up @@ -313,6 +336,7 @@ We could change LibP2P PubSub to only send the events that a node cares about to

This approach was rejected because it does not solve the missed messages problem.


## Backwards Compatibility
<!--All CIPs that introduce backwards incompatibilities must include a section describing these incompatibilities and their severity. The CIP must explain how the author proposes to deal with these incompatibilities. CIP submissions without a sufficient backwards compatibility section may be rejected outright.-->

Expand All @@ -336,6 +360,7 @@ The associative hash functions are only secure if the node is asked to produce t

It's important that a node that receives a new eventId over recon synchronizes the data of this event and validates it before it relays this eventId to other peers. Otherwise invalid eventIds might be relayed


## Appendix A: Associative Hash Function (Sha256a)

An associative hash function can simply be defined as a hash function that is associative:
Expand Down Expand Up @@ -411,7 +436,6 @@ A b-tree with fanout 2:
![fanout2](../assets/cip-124/b_hash_tree_2.png)



## Appendix B: B#tree (B hash trees)

e.g. [MST](https://hal.inria.fr/hal-02303490/document) / [Prolly Trees](https://docs.dolthub.com/architecture/storage-engine/prolly-tree)
Expand Down