Skip to content

feat: add guardian-publish-validator — methodology library health-check tool#6069

Open
danielnorkin wants to merge 1 commit into
hashgraph:developfrom
Climission:feature/guardian-publish-validator
Open

feat: add guardian-publish-validator — methodology library health-check tool#6069
danielnorkin wants to merge 1 commit into
hashgraph:developfrom
Climission:feature/guardian-publish-validator

Conversation

@danielnorkin
Copy link
Copy Markdown
Collaborator

Summary

Adds a new tool under tools/guardian-publish-validator/ that scans the methodology library and reports per-entity health for every published artifact (policy, tool, module, schema). CLI for CI/CD integration, web UI for browsing, and an example GitHub Actions workflow.

Discovered while supporting a customer hitting Cannot read properties of null (reading 'type') on a policy import. The Hedera anchor was alive; the IPFS payload referenced inside it was unreachable from every gateway except Pinata. With current Guardian, that's a silent failure with no diagnostic. This tool surfaces both Hedera-side and IPFS-side rot, library-wide, before users hit them.

The four buckets

Per discussion on the Resilient framing:

Bucket Meaning
Healthy Hedera anchor alive + a local Kubo serves the IPFS bytes (operator-controlled). No external dependency.
Resilient Hedera anchor alive + at least two independent public gateways serve it. Ideal state for upstream library content where there is no single operator-controlled Kubo.
Fragile Reachable from exactly one gateway. Works today, single point of failure.
Broken No gateway has the bytes, or the Hedera message is missing. Action needed.

CI/CD integration

examples/github-actions.yml is a copy-paste workflow. Two flags drive CI behavior:

  • --fail-on broken — fails CI only on Broken entries. Suitable for PR gates that want to block introducing dead refs without rejecting fresh content that has not accumulated multiple pins yet.
  • --fail-on fragile — fails on Fragile or Broken. Requires every entity to be Resilient or Healthy. Useful for scheduled sweeps catching pin-lineage degradation.

--changed-only-from <file> scopes the scan to a list of changed file paths (pairs with git diff --name-only in PR runs) so the CI job does not have to walk the entire library on every PR.

Findings from running it against upstream develop (testnet)

Of 84 unique entities across 167 archives:

  • 18 Resilient
  • 16 Fragile
  • 8 Broken (including AMS-II.G policy itself, plus tools used by Verra VM0003 and VM0048)
  • Remaining 42 are Healthy only when the operator local Kubo is in the gateway list

Full report JSON is reproducible via npm run validate -- --branch develop.

What is in this PR

  • bin/validator.js — CLI entry point
  • bin/ui.js — Express server for the web dashboard
  • src/library.js.policy and .tool zip walker
  • src/manifests.js — README markdown table parser
  • src/mirror.js — Hedera mirror node client and message decode
  • src/ipfs.js — multi-gateway probing with Healthy / Resilient / Fragile / Broken classification
  • src/repo.js — auto-clone hashgraph/guardian into the user cache when no --path is given
  • ui/index.html — single-page dashboard (light and dark, sortable, filterable, CSV export)
  • tests/ — 15 unit tests across the parser, walker, and message decoder
  • examples/github-actions.yml — ready-to-use workflow
  • README.md — install, usage, CI/CD integration, output schema, contribution notes

License

Apache-2.0, matching the rest of the repo. Headers on every source file.

Out of scope (planned follow-ups)

  • Republisher tool — same package, two repair modes (re-pin and full republish). Provenance question (whose DID signs the new Hedera message when republishing community content) needs guidance first.
  • Multi-gateway fallback in worker-service/src/api/ipfs-client-class.ts — separate PR. The validator diagnoses the gateway-availability problem; the worker-service fix is the root-cause remedy so Guardian getFile() no longer breaks on degraded gateways.
  • Mainnet validation — currently testnet-focused. Mainnet is durable so the failure mode is rarer, but the same checks apply.

Test plan

  • npm test passes (15 tests)
  • CLI smoke (--help, --skip-ipfs single-archive run)
  • Full library scan completes in ~5 min at concurrency 6
  • Web UI loads, summary card counts match CLI output
  • --fail-on broken exits non-zero on a synthetic stale fixture
  • --changed-only correctly scopes the scan to listed paths
  • Auto-clone path works (clones hashgraph/guardian to local cache on first run)

Tracked internally at https://github.com/Climission/Managed-Guardian-Service/issues/1018

Adds tools/guardian-publish-validator/ - a CLI and web UI that scans
the methodology library and reports per-entity health based on Hedera
consensus message + IPFS gateway reachability.

Built around two failure modes observed in production:

- Hedera testnet resets wipe consensus messages periodically, leaving
  dangling references in published policies and tools.
- The public IPFS gateway ecosystem is degrading (Cloudflare sunsetted
  their gateway in 2024, web3.storage/Storacha is winding down through
  2026). Content pinned only to one provider becomes unreachable while
  the Hedera anchor still resolves, producing silent null-reference
  errors at import time.

Each entity (policy, tool, module, schema) lands in one of four buckets:

  Healthy    Hedera anchor alive + a local Kubo serves the IPFS bytes
  Resilient  Hedera anchor alive + at least two independent public
             gateways serve it (ideal for upstream library content
             where there is no operator-controlled Kubo)
  Fragile    Reachable from exactly one gateway; single point of failure
  Broken     No gateway has the bytes, or the Hedera message is missing

Features:

- CLI with --fail-on broken|fragile for CI gating
- --changed-only / --changed-only-from for PR-scoped checks
  (pairs with `git diff --name-only` in GitHub Actions)
- Multi-gateway probing with repeatable --local-gateway flag for
  operators with their own Kubo node
- Auto-clones the upstream methodology library when no --path is given,
  caching in ~/.cache/guardian-publish-validator/
- Web UI for browsing reports (light/dark theme, sortable table,
  filter pills, per-entity detail panel with clickable GitHub links,
  CSV export, scan-on-demand)
- Example GitHub Actions workflow in examples/

Tests: node:test unit coverage for the README markdown parser, library
walker, and Hedera message decoder. Fixture-based, no external deps
required to run.

License: Apache-2.0, matching the rest of hashgraph/guardian.
@danielnorkin danielnorkin requested review from a team as code owners May 13, 2026 14:36
@danielnorkin danielnorkin requested review from a team, EMerchant90 and rbarker-dev May 13, 2026 14:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant