feat: add --subnets flag to deploy multiple nodes per client by ch4r10t33r · Pull Request #136 · blockblaz/lean-quickstart

ch4r10t33r · 2026-03-17T20:30:53Z

Summary

Adds a `--subnets N` flag (N = 1–5) to deploy N independent copies of each client on the same server. Each copy gets a unique name (`{client}_0` … `{client}_N-1`), incrementally offset ports, and a fresh P2P identity key — nodes on the same host never share a subnet.
`generate-subnet-config.py` — new script that expands `validator-config.yaml` into `validator-config-subnets-N.yaml`; validates that no two template entries share an IP or client type; adds an explicit `subnet:` field to each generated entry; sets `config.attestation_committee_count = N` so clients partition attestation committees correctly.
Updates `--prepare` to open the full port range for all N subnet nodes per host (ports `base` … `base+N-1` for QUIC/UDP, metrics/TCP, and API/TCP), by matching validator entries by IP rather than by hostname.
Adds `--dry-run` flag — full deployment simulation without applying any changes (Ansible runs with `--check --diff`, local execs are echoed only, genesis generation is skipped).
Aggregator selection overhauled: one aggregator per subnet chosen randomly, with pre-existing `isAggregator: true` in the YAML honoured as the user's choice (overriding random selection). An invariant check hard-fails if any subnet ends up with ≠ 1 aggregator.
Subnet membership derived from the explicit `subnet:` field in the config (not from node name suffix), so nodes like `ethlambda_1` in a single-subnet config are not mistakenly treated as belonging to subnet 1.
`copy-genesis.yml` and `deploy-nodes.yml` now copy only the hash-sig keys for each host's own validators (not the entire directory), scoped via `annotated_validators.yaml`.
Adds `docs/adding-a-new-client.md` — a comprehensive step-by-step guide for integrating a new Lean Ethereum client, linked from the README.

Files changed

File	Change
`generate-subnet-config.py`	New — subnet config generator
`spin-node.sh`	`--subnets`, `--dry-run`, aggregator selection overhaul
`parse-env.sh`	Parse `--subnets N` and `--dry-run` arguments
`run-ansible.sh`	Pass `validator_config_basename` extra var; `--check --diff` in dry-run
`ansible/playbooks/deploy-nodes.yml`	Use `validator_config_basename`; copy only per-node hash-sig keys
`ansible/playbooks/copy-genesis.yml`	Copy only per-node hash-sig keys
`ansible/playbooks/prepare.yml`	Open all subnet port ranges per host
`convert-validator-config.py`	Fall back to `httpPort` for Lantern when generating leanpoint upstreams
`ansible-devnet/genesis/validator-config.yaml`	Add `privkey` for commented-out clients
`README.md`	Document `--subnets`, `--dry-run`, link to new client guide
`docs/adding-a-new-client.md`	New — client integration guide (see below)

Adding a new client

The new guide at `docs/adding-a-new-client.md` covers the 6 files every new client must provide, with full code examples for each:

`validator-config.yaml` (both `local-devnet` and `ansible-devnet`) — node entry with `privkey`, ports, and IP. Local uses `127.0.0.1`; Ansible uses the server's public IP (contact the zeam team to get a server assigned). Ports must be unique per server; `--subnets N` handles the expansion automatically.
`client-cmds/myclient-cmd.sh` — defines `node_binary`, `node_docker`, and `node_setup`. Documents all injected variables (`$item`, `$configDir`, `$isAggregator`, `$attestationCommitteeCount`, etc.) and the required CLI flags (`--attestation-committee-count`, `--is-aggregator`, `--checkpoint-sync-url`, etc.). Client must expose `GET /v0/health`.
`ansible/roles/myclient/defaults/main.yml` — fallback Docker image and deployment mode.
`ansible/roles/myclient/tasks/main.yml` — full Ansible task file: extract image from `client-cmd.sh`, read ports from config, stop/start Docker container with core-dump, aggregator, and checkpoint-sync support.
`ansible/playbooks/helpers/deploy-single-node.yml` — add `include_role` block and update the unknown-client-type guard.
`README.md` — add to Clients supported list.

Everything else (genesis generation, key management, inventory generation, subnet expansion, leanpoint upstreams, aggregator selection, observability) is fully generic and requires no changes.

Test plan

`--subnets 2` generates correct `validator-config-subnets-2.yaml` with unique ports and keys
`--dry-run` prints simulation output without modifying any file or deploying anything
Aggregator selection respects pre-existing `isAggregator: true` and does not override it
Nodes with numeric suffixes (e.g. `ethlambda_1`) in a single-subnet config are all assigned to subnet 0
Hash-sig keys: each server receives only its own validator's `_sk.ssz` / `_pk.ssz` files
`--prepare` opens correct port ranges when used with `--subnets N`
New client guide is accurate — follow it end-to-end with a test client

Add support for configuring nodes as aggregators through validator-config.yaml. This allows selective designation of nodes to perform aggregation duties by setting isAggregator: true in the validator configuration. Changes: - Add isAggregator field (default: false) to all validators in both local and ansible configs - Update parse-vc.sh to extract and export isAggregator flag - Modify all client command scripts to pass --is-aggregator flag when enabled - Add isAggregator status to node information output

Resolved conflicts in client-cmds scripts by keeping both: - Aggregator flag support - Checkpoint sync URL support Updated Docker images: - zeam: 0xpartha/zeam:devnet3 - lantern: piertwo/lantern:v0.0.3-test - ethlambda: ghcr.io/lambdaclass/ethlambda:devnet3 Added httpPort support for lantern nodes.

Adds --subnets N (1–5) to deploy N nodes of each client on their associated servers, each on a distinct attestation subnet. New files: - generate-subnet-config.py: expands validator-config.yaml into validator-config-subnets-N.yaml with unique node names, incremented ports (quic/metrics/api), fresh P2P private keys, and explicit subnet membership per entry. Also sets config.attestation_committee_count = N so each client correctly partitions validators across N committees. Changes: - parse-env.sh: add --subnets N and --dry-run flags - spin-node.sh: - expand validator-config before genesis setup when --subnets N given - select one aggregator per subnet randomly; print prominent summary - --dry-run: simulate full deployment without applying any changes (Ansible runs with --check --diff, local execs are echoed only) - run-ansible.sh: pass validator_config_basename extra var so playbooks use the active (possibly expanded) config; add --check --diff in dry-run - ansible/playbooks/deploy-nodes.yml: use validator_config_basename to sync the correct config file to remote hosts - ansible/playbooks/prepare.yml: open port ranges for all subnet nodes on a host by matching entries via IP, not just hostname - convert-validator-config.py: fall back to httpPort for Lantern nodes when generating Leanpoint upstreams - README.md: document --subnets and --dry-run; update --prepare firewall table to reflect port ranges when --subnets N is active Rules enforced by generate-subnet-config.py: - No two nodes on the same server may share a subnet (template validated) - Each subnet has exactly one node per client - N=1 is a no-op expansion (single-subnet baseline) - N capped at 5

Previously both deploy-nodes.yml and copy-genesis.yml synced the entire hash-sig-keys/ directory to every remote host, meaning every server received every validator's sk/pk pair. Now each playbook: 1. Reads annotated_validators.yaml on the controller to look up the privkey_file entries for the node being deployed (inventory_hostname). 2. Derives the pk filename by replacing _sk.ssz → _pk.ssz. 3. Copies only those specific files to the target host. A server running zeam_0 (validator_0_sk.ssz / validator_0_pk.ssz) no longer receives validator_1_sk.ssz, validator_2_sk.ssz, etc.

…peam_0

…ffix The old suffix-based detection (ethlambda_1 → subnet 1) broke when a config contained multiple nodes for the same client without --subnets (e.g. ethlambda_0..4 for redundancy), incorrectly creating 5 subnets and forcing ethlambda nodes as the sole aggregator on subnets 1-4. Subnet membership is now read from the explicit 'subnet:' field that generate-subnet-config.py writes for each entry. Nodes without this field (all standard configs) default to subnet 0, so a single-subnet deployment always selects exactly one aggregator from all active nodes regardless of numeric suffixes in their names.

…r flag is passed Previously the script always reset all flags and randomly re-selected an aggregator, ignoring any manual isAggregator: true already set in the YAML. This caused ethlambda_0 (user's choice) to be silently replaced by ethlambda_1 (random pick). Aggregator selection now follows a three-level priority: 1. --aggregator <node> CLI flag 2. Pre-existing isAggregator: true in the config (manual YAML edit) 3. Random selection (fallback when neither is set) The preset node is validated against the active node list. If it no longer exists a warning is printed and random selection takes over.

…nsible examples

The hardcoded group list (zeam_nodes, ream_nodes, ...) caused newly added client types (e.g. gean_nodes) to never have their ansible_user updated. This meant --useRoot was silently ignored for those nodes, causing Ansible to SSH as the current local user (partha) instead of root, and fail.

…ubnets

zclawz

Overall well-structured PR — the subnet expansion model is clean and the per-node hash-sig key copying is a meaningful improvement. A few observations:

1. Double validation in spin-node.sh

The outer guard [ "$subnets" -ge 1 ] 2>/dev/null silently suppresses non-integer errors, and the inner guard then re-validates the same range. Combining into a single block would be cleaner:

if [ -n "$subnets" ]; then
  if ! [[ "$subnets" =~ ^[0-9]+$ ]] || [ "$subnets" -lt 1 ] || [ "$subnets" -gt 5 ]; then
    echo "Error: --subnets requires an integer between 1 and 5"
    exit 1
  fi
  # ... expansion logic
fi

2. MAX_SUBNETS = 5 in two places

generate-subnet-config.py and spin-node.sh both independently enforce the 1–5 range. They match today, but a future change in one won't automatically update the other. A cross-reference comment would help.

3. Private keys in ansible-devnet/genesis/validator-config.yaml

The privkey fields added for gean_0 and nlean_0 are P2P identity keys committed in plaintext. Consistent with how other devnet entries are handled, so presumably intentional — just confirming these are devnet-only keys.

4. run-ansible.sh positional arg expansion ($12)

Adding dryRun as $12 is safe — callers that don't pass it get an empty string (falsy). All spin-node.sh call sites pass it correctly.

5. Dynamic group discovery in run-ansible.sh

Replacing the hardcoded client-group list with yq eval .all.children | keys is a good improvement — new clients no longer require updating the list. One edge case: if yq is absent on the Ansible controller (localhost) and the || echo "" fallback fires, SSH key injection is silently skipped for all hosts. Worth an explicit yq check at the top of the script or at least a warning.

6. Per-node hash-sig key copying

Good improvement — only the sk/pk files assigned to each node are transferred. The when: node_hash_sig_files | length > 0 condition is correct. One question: if annotated_validators.yaml exists but a node has no assignments in it, the hash-sig directory is not created and no keys are copied — is that intentional (node needs no hash-sig keys) or should it emit a warning?

7. generate-subnet-config.py

The validation logic, port-increment scheme, secrets.token_hex(32) for P2P keys, and attestation_committee_count = N injection all look correct. The duplicate-IP / duplicate-client-type checks in _validate_template are solid defensive guards.

Overall looks good. Happy to approve once the double-validation in spin-node.sh is tidied up (or if you prefer to leave it with a comment, that is fine too).

ch4r10t33r added 6 commits February 6, 2026 14:56

Merge branch 'main' of https://github.com/blockblaz/lean-quickstart

0522c16

Merge branch 'main' of https://github.com/blockblaz/lean-quickstart

1522fd6

Merge branch 'main' of https://github.com/blockblaz/lean-quickstart

99e1d5c

Merge branch 'main' of https://github.com/blockblaz/lean-quickstart

53814fa

ch4r10t33r marked this pull request as ready for review March 17, 2026 21:33

ch4r10t33r requested a review from g11tech March 17, 2026 21:33

ch4r10t33r added the enhancement New feature or request label Mar 17, 2026

ch4r10t33r changed the title ~~feat: add --subnets flag to deploy multiple nodes per client~~ [WIP] feat: add --subnets flag to deploy multiple nodes per client Mar 18, 2026

ch4r10t33r marked this pull request as draft March 18, 2026 07:33

ch4r10t33r added the devnet-4 label Mar 18, 2026

ch4r10t33r force-pushed the subnets branch from eb9f329 to 5c3a920 Compare March 18, 2026 12:01

ch4r10t33r force-pushed the subnets branch from 5c3a920 to 414d49c Compare March 18, 2026 12:05

ch4r10t33r and others added 9 commits March 18, 2026 12:55

Merge branch 'main' into subnets

9ce84d7

spin-node: assert exactly 1 aggregator per subnet after selection

1db5cd1

validator-config: add privkey for commented-out gean_0, lean_node_0, …

a305f30

…peam_0

docs: add client integration guide with link from README

4133a31

docs: clarify touch point 1 — both configs required, separate local/a…

92279ea

…nsible examples

docs: add note to contact zeam team for server IP assignment

f153b38

ch4r10t33r marked this pull request as ready for review March 18, 2026 15:06

ch4r10t33r removed the devnet-4 label Mar 18, 2026

ch4r10t33r changed the title ~~[WIP] feat: add --subnets flag to deploy multiple nodes per client~~ feat: add --subnets flag to deploy multiple nodes per client Mar 18, 2026

ch4r10t33r and others added 3 commits March 18, 2026 15:44

spin-node: fix associative array for bash 3.2 compatibility

fbce729

validator-config: use apiPort for lantern instead of httpPort

fe4b527

fix: cadvisor deploy

6dcccf1

ch4r10t33r and others added 17 commits March 18, 2026 18:07

prepare: install jq alongside yq and docker

b785bd8

fix: grandine address flag

8dabd90

fix: grandine address flag ansible

0b5051a

spin-node: skip aggregator selection when using --restart-client

f71c818

Merge branch 'main' into subnets

edfea3d

validator-config: enable gean_0 node

09d4fc0

Merge branch 'main' into subnets

b9c2c1b

Merge branch 'main' into subnets

b59e331

validator-config: add nlean_0 node

9577f4a

ansible: add gean and nlean roles and wire into deploy

827f9e7

docs: update adding-a-new-client guide with gean and nlean

ff50c26

nlean: remove --pull=always for locally-built image

f2f16bc

nlean: use ghcr.io/nleaneth/nlean:latest as docker image

020cb6f

fix: enable metrics flag for nlean

a20ec27

Merge branch 'main' into subnets: resolve conflicts (gean, nlean, peam)

5575c03

Merge branch 'subnets' of github.com:blockblaz/lean-quickstart into s…

888803e

…ubnets

zclawz reviewed Mar 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add --subnets flag to deploy multiple nodes per client#136

feat: add --subnets flag to deploy multiple nodes per client#136
ch4r10t33r wants to merge 36 commits intomainfrom
subnets

ch4r10t33r commented Mar 17, 2026 •

edited

Loading

Uh oh!

zclawz left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ch4r10t33r commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Files changed

Adding a new client

Test plan

Uh oh!

zclawz left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ch4r10t33r commented Mar 17, 2026 •

edited

Loading