Skip to content

Add CI pipeline, fix validator script error handling, and rewrite RPC tests with assertions#101

Draft
Copilot wants to merge 3 commits intomainfrom
copilot/add-ci-cd-pipeline-monitoring
Draft

Add CI pipeline, fix validator script error handling, and rewrite RPC tests with assertions#101
Copilot wants to merge 3 commits intomainfrom
copilot/add-ci-cd-pipeline-monitoring

Conversation

Copy link

Copilot AI commented Feb 28, 2026

No CI existed beyond a project-board workflow. Validator update scripts had no error handling, a shared typo, and no download verification. RPC tests only printed responses — they never failed.

CI Pipeline (.github/workflows/ci.yml)

  • ShellCheck (--severity=warning) across all .sh files
  • JSON validation via python3 -m json.tool for all .json files
  • pytest for rpc/rpc_test.py with a 5-minute timeout
  • All jobs scoped to permissions: contents: read

Validator Scripts (update-validator-mainnet.sh, update-validator-testnet.sh)

  • Added set -euo pipefail — aborts on error, unset variable, or pipeline failure
  • Fixed typo download_sunbet_evmdownload_subnet_evm in both scripts (definition + call)
  • Added sha256 checksum verification after each download
  • Quoted all variable expansions

RPC Tests (rpc/rpc_test.py)

Replaced non-asserting print()-only functions with proper pytest tests:

# Before — never fails
def test_chain_id(rpc_url, chain_id):
    response = requests.post(rpc_url, json=payload)
    print("RPC request successful, Response Data:", response.json())

# After — fails on wrong chain ID or missing result
def test_mainnet_chain_id():
    data = _rpc_post(MAINNET_RPC_URL, payload)
    assert "result" in data
    assert int(data["result"], 16) == MAINNET_CHAIN_ID

Added rpc/requirements.txt (pytest, requests, websockets).

ShellCheck Fixes in Pre-existing Scripts

Required to make the new CI job pass:

  • Added #!/bin/bash shebang to both install-subnet-cli.sh copies
  • Added shebang + # shellcheck disable=SC2034 to api/env.sh (sourced file)
  • Fixed cd ~cd ~ || exit and quoted tar arguments in backup-validator.sh
Original prompt

This section details on the original issue you should resolve

<issue_title>[Feature][High] Add CI/CD pipeline, monitoring/alerting, and error handling for validator scripts</issue_title>
<issue_description>## Summary

Seven high-priority improvement opportunities for reliability, observability, and operational safety:

1. Minimal CI/CD Pipeline — No Validation or Testing

The only workflow (add-issue-to-backlog.yml) adds issues to a project board. No CI for JSON validation, shell linting, Nginx config validation, or test execution.

Suggested implementation:

  • Add shellcheck linting for all .sh files
  • Add JSON schema validation for genesis files
  • Add nginx -t validation for Nginx configs
  • Add pytest execution for rpc_test.py
  • Add version consistency checks across README and scripts

2. No Monitoring, Alerting, or Health Check Automation

Health checking exists only as manual ad-hoc scripts. No automated monitoring, alerting, or dashboards.

Files: api/health.health.sh, api/metrics.sh, rpc/rpc_test.py

Suggested implementation:

  • Add Prometheus/Grafana config for avalanchego metrics
  • Create scheduled health check workflow
  • Add validator expiry monitoring
  • Integrate with uptime monitoring (UptimeRobot, Slack/Discord alerts)

3. No Error Handling in Critical Validator Update Scripts

chains/update-validator-mainnet.sh and chains/update-validator-testnet.sh download and install binaries without error checking. No set -e, no download verification, no rollback.

Suggested fix: Add set -euo pipefail, verify downloads, backup current binaries before replacement, add --dry-run mode.

4. Massive Duplication Between Testnet and Mainnet Scripts

update-validator-mainnet.sh and update-validator-testnet.sh are 95%+ identical. Both contain the typo download_sunbet_evm. install-subnet-cli.sh exists as identical copies in two directories.

Suggested fix: Create single parameterized scripts, fix the typo, remove duplicates.

5. Hardcoded Node IDs Scattered Across Files

Node IDs, Subnet IDs, VM IDs, and fee recipient addresses are hardcoded in individual scripts instead of a central config.

Suggested fix: Create config/network.env or config/networks.json as single source of truth.

6. Nginx RPC Proxy Needs Security Hardening

Missing rate limiting, security headers, request filtering, and upstream health checks. Only testnet config exists — no mainnet Nginx config in repo.

7. RPC Tests Use print() Instead of Assertions

rpc_test.py never fails — prints responses without validating. websocket_test.py runs forever with no CI timeout.

Suggested fix: Rewrite with pytest, add assertions, add requirements.txt.

🤖 Generated with Claude Code</issue_description>

Comments on the Issue (you are @copilot in this section)


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI and others added 2 commits February 28, 2026 12:23
…RPC tests with pytest

Co-authored-by: numbers-official <181934381+numbers-official@users.noreply.github.com>
Co-authored-by: numbers-official <181934381+numbers-official@users.noreply.github.com>
Copilot AI changed the title [WIP] Add CI/CD pipeline, monitoring, and error handling for validator scripts Add CI pipeline, fix validator script error handling, and rewrite RPC tests with assertions Feb 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature][High] Add CI/CD pipeline, monitoring/alerting, and error handling for validator scripts

2 participants