-
Notifications
You must be signed in to change notification settings - Fork 32
Description
Overview
As a prelude to the task to improve pull sync time (wrt swip), it is necessary to provide a baseline average node sync time that benefits from a large and diverse sample set, and repeatability.
It is also noted that full node spin-up time is a key metric that is not currently and should be continuously monitored to ensure a good experience for new nodes joining the network. In order to service this, we will add facility to Beekeeper to "reset" nodes and tests that are required to capture these metrics in the first instance, and later to bootstrap completely new nodes.
Implementation Note
Our approach should focus on providing the scaffolding and infrastructure and then immediately required tests - maintaining a tight scope of this task in order to provide for future utility while ensuring simplicity for this task and hence a tight turnaround for this first iteration.
Methodology
Node Spin Up
In order to establish a new node, three tasks must be completed prior to beginning normal protocol interaction:
- Generate new key material: Swarm Key; LibP2P key, Ethereum Key
- Fund the wallet with XDAI, XBZZ. Create the necessary transactions that the tests require.
- Join the network, populate the hive and conduct historical pull sync.
If a node leaves the network for some time, or for some reason the state of the node is lost, only the third step is required. Since our immediate needs for data are related only to processes involved in step 3, it is suggested that the first instance of the new tests focus on this step only, and utilise already established key material and associated blockchain state.
This can be simulated by "nuking" a pre-existent node's storage (but not keys). It is therefore suggested that the first iteration implement a "node reset" smoke tests and associated testing architecture, and that establishing a new key pair and bootstrapping is left for a future task.
Key Metrics
In the initial, our fundamental interest is in the time it takes an arbitrary node to go from an empty reserve to having one which is full and considers itself synced. While we can and should add many more metrics as time goes on, as well as checks that the reserve is successfully and accurately synced, for now we should focus on providing statistics the time taken for a node to go from one step to another in the following journey.
- t_0 Node has zero chunks in reserve and zero peers.
- t_1 Node has hive populated to extent required to begin syncing from local neighbourhood, sync begins.
- t_2 Node completes sync and considers historical sync complete
- t_3 Node begins normal protocol interaction
A headline figure for mainnet nodes going from a "reset" or "nuked" state (0) to normal interaction (4) will then be monitored on a continuous basis as part of the SLA/smoke test cohort analysis.
It is noted that these metrics have specific characteristics which are determined by a node's address/location in the network. In order to provide for this, it is suggested that in the first instance several pre-seeded nodes occupying different places in the network are monitored. If these produce heterogenous results, phase 2 may be required with more urgency. During phase 2, arbitrarily addressed node spin ups can be provided for, as well as the associated requirements for blockchain funding and bootstrapping.
Other Metrics
Once the scaffolding is in place, other metrics can be added on an adhoc basis, in order to establish bottlenecks in the process and monitor the speed/efficacy of other spin up activities, eg. hive bootstrapping.