Skip to content

DEVOP-617: org-wide go.mod replace-directive audit#9

Open
srt0422 wants to merge 5 commits into
mainfrom
scott/devop-617-gomod-replace-audit
Open

DEVOP-617: org-wide go.mod replace-directive audit#9
srt0422 wants to merge 5 commits into
mainfrom
scott/devop-617-gomod-replace-audit

Conversation

@srt0422
Copy link
Copy Markdown

@srt0422 srt0422 commented May 25, 2026

Summary

Adds a weekly + manual-dispatch audit that catches the Go-side
Shai-Hulud vector: a compromised go.mod replace directive
redirecting a legitimate import to an attacker fork. Complements
DEVOP-560 (PR #8) — DEVOP-560 is the deep daily forensic clone sweep,
this is the lighter no-clone Contents-API pass that runs weekly across
every Go module in the org.

Linear: https://linear.app/alloralabs/issue/DEVOP-617

What this PR adds

  1. .github/workflows/gomod-replace-audit.yml — weekly Mon 05:17 UTC + workflow_dispatch.

    • Enumerates every org Go module via gh search code --owner allora-network 'filename:go.mod' --limit 200.
    • Fetches each go.mod via gh api repos/<r>/contents/<p> (no cloning).
    • Runs the canonical awk extractor from shai-hulud-defense/REFERENCE.md (handles single-line and replace (...) block form).
    • Classifies each RHS:
      • legitimate-allowlisted-host — RHS host in canonical trusted-host allowlist (github.com/(allora-network|cosmos|ethereum|fluxcd) | gopkg.in | google.golang.org | go.uber.org | go.opentelemetry.io | k8s.io | sigs.k8s.io).
      • legitimate-version-pinLHS module path == RHS module path. Structurally cannot redirect to an attacker fork; flagged only by the host filter otherwise.
      • legitimate-local-relative./... / ../... workspace replace.
      • investigate-absolute/... (IOC-grade).
      • SUSPICIOUS — non-allowlisted host AND LHS != RHS (IOC-grade).
    • SUSPICIOUS or investigate-absolute → rolling GitHub Issue (label gomod-replace-audit, distinct from DEVOP-560's shai-hulud-sweep) + Slack page via SLACK_SECURITY_WEBHOOK.
    • Fetch failures → rolling-issue update only.
    • Permissions: contents: read, issues: write. All uses: SHA-pinned (matches DEVOP-560 pins).
    • actionlint clean (incl. shellcheck).
  2. docs/security/gomod-replace-audit-2026-05-25.md — initial point-in-time audit report (executed locally before opening this PR).

  3. docs/plans/2026-05-25-devop-617-gomod-replace-audit.md — short execution plan.

Top-line audit results (from the report)

Metric Count
Go modules scanned 12 (across 11 repos)
Modules with zero replace directives 8
Total replace directives found 6
Allowlisted-host RHS 2
Same-path version-pin RHS (non-allowlisted host but LHS == RHS) 4
Local relative / absolute / SUSPICIOUS 0 / 0 / 0
Escalated to incident response 0

No SUSPICIOUS findings. No escalation. Every current replace directive in the org is either:

  • on an allowlisted host (cosmos), OR
  • a same-path version pin (LHS module path == RHS module path — structurally cannot redirect).

Specifically the four non-allowlisted entries are all same-path version pins:

  • allora-chain/go.mod: gin-gonic/gin v1.9.1 and syndtr/goleveldb v1.0.1-... (canonical Cosmos SDK simapp pins; adjacent comments reference cosmos/cosmos-sdk#10409).
  • allora-sdk-go/go.mod + forge-v2/backend/go.mod: cometbft/cometbft v0.38.17 (canonical Cosmos BFT consensus engine).

See the full audit report for the row-by-row classification + recommendations (adding github.com/cometbft to the canonical allowlist is the obvious follow-up).

Coordination with PR #8 (DEVOP-560)

  • Different branch (scott/devop-617-gomod-replace-audit), different files (workflow + plan + report all new).
  • Different rolling-issue label (gomod-replace-audit vs shai-hulud-sweep) so the two pipelines don't collide.
  • Cron offset (Mon 05:17 UTC vs daily 04:07 UTC) so they don't compete for the same scheduler slot.
  • This workflow does NOT touch scripts/shai-hulud-ioc-sweep.sh or any of PR DEVOP-560: add org-wide daily Shai-Hulud IOC sweep workflow #8's files.

Test plan

  • actionlint .github/workflows/gomod-replace-audit.yml (incl. shellcheck) — clean.
  • Local end-to-end audit produced 6 findings, 0 SUSPICIOUS (matches the report).
  • workflow_dispatch after merge to validate the workflow run produces an artifact + (with current org state) leaves the rolling issue untouched.
  • Inject a synthetic suspicious replace in a sandbox repo and re-run to confirm the rolling-issue + Slack path fires.

Made with Cursor


Summary by cubic

Adds a weekly and manual org audit that checks every go.mod replace for attacker redirects and alerts on anything outside the trusted-host allowlist. Meets DEVOP-617 acceptance criteria; improves parsing, scope checks (including partial private-coverage detection), and alerting to avoid false-clean runs and noisy pages.

  • New Features

    • Adds .github/workflows/gomod-replace-audit.yml (Mon 05:17 UTC + workflow_dispatch).
    • Scans via gh search code + gh api (no cloning); classifies replaces; SUSPICIOUS or absolute → rolling issue + Slack; fetch failures → issue only.
    • Uploads audit.tsv and summary.md; SHA-pinned actions; minimal perms (contents: read, issues: write).
    • Token-scope probe: fail loudly if the token can’t see private repos; require secrets.GH_ORG_READ_TOKEN or ack public-only runs via vars.ACCEPT_PUBLIC_ONLY_AUDIT=true.
    • Docs added (plan + initial report). Initial audit: 12 modules, 6 replaces, 0 suspicious.
  • Bug Fixes

    • AWK extractor: skip full-line // comments and strip trailing // ... on real directives to avoid false SUSPICIOUS.
    • Contents API: normalize JSON null to empty; accept spec-valid go.mod with a module directive anywhere (leading comments allowed); otherwise treat as fetch failure.
    • Slack paging: gate on success() and non-empty outputs; final run summary reflects audit failure vs clean.
    • Warn when gh search code hits the 200-result cap.
    • Corrected report: zero-replace modules is 9; added “modules with at least one replace” row.
    • Scope probe: paginate private-repo listing to detect partial coverage (selected-repo tokens); fail unless acknowledged via vars.ACCEPT_PUBLIC_ONLY_AUDIT=true.

Written for commit be4718d. Summary will update on new commits. Review in cubic

Adds a weekly + manual-dispatch audit that enumerates every Go module
across the allora-network org, fetches each `go.mod` via the Contents API,
extracts every `replace` directive (single-line + `replace (...)` block
form) with the canonical awk extractor from shai-hulud-defense
REFERENCE.md, and classifies the RHS against the same trusted-host
allowlist that scripts/shai-hulud-ioc-sweep.sh uses.

Findings:
- SUSPICIOUS (RHS non-allowlisted host AND LHS module path != RHS) →
  rolling GitHub Issue (label `gomod-replace-audit`) + Slack page via
  SLACK_SECURITY_WEBHOOK.
- legitimate-version-pin (LHS == RHS module path, structurally cannot
  redirect) → no-op.
- Fetch failures → rolling-issue update only (operational, not IOC).

Distinct rolling-issue label from DEVOP-560's `shai-hulud-sweep` so the
two pipelines don't collide. SHA-pinned `uses:`. Permissions are
`contents: read` + `issues: write` only.

Initial point-in-time audit committed at
docs/security/gomod-replace-audit-2026-05-25.md: 12 modules scanned,
6 replace directives, 0 SUSPICIOUS findings. All current replaces are
same-path version pins from the Cosmos SDK simapp pattern
(gin-gonic/gin, syndtr/goleveldb, cosmos/cosmos-sdk, cometbft/cometbft).

Refs: https://linear.app/alloralabs/issue/DEVOP-617
Co-authored-by: Cursor <cursoragent@cursor.com>
@srt0422 srt0422 added shai-hulud Shai-Hulud supply-chain defense work needs-human-review labels May 25, 2026
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cubic analysis

2 issues found across 3 files

Linked issue analysis

Linked issue: DEVOP-617: Audit all org go.mod files for suspicious replace directives

Status Acceptance criteria Notes
Enumerate every org Go repo via gh search (filename:go.mod --limit 200). The workflow's 'Enumerate org Go modules' step runs gh search code --owner "$ORG" 'filename:go.mod' --limit 200 and writes the repo+path tuples to paths.tsv.
For each repo, fetch go.mod and run the replace extractor (awk) to list replace directives (single-line and replace(...) block form). The 'Fetch and audit each go.mod' step uses gh api repos//contents/ to fetch and base64-decode go.mod and runs an awk extractor that handles single-line and block-form replace directives.
Cross-check each replace RHS against the canonical trusted-host allowlist. The workflow sets GO_TRUSTED_HOSTS_RE to the canonical allowlist regex and classifies RHS using grep -qE against that variable.
⚠️ For every non-allowlisted match, document repo + go.mod line, target path, reason for the replace (commit history), and disposition (legitimate / remove / investigate). The workflow records repo, path, line number, LHS/RHS, classification and the original line (audit.tsv and summary.md) and the docs include row-by-row classifications and dispositions, but it does not capture commit history or provenance (reason for the replace) from git metadata.
Produce a final report on the ticket; escalate SUSPICIOUS findings to incident response. The PR includes an initial point-in-time report file and the workflow uploads summary/artifacts; the workflow appends/creates a rolling issue for non-clean runs and pages Slack for SUSPICIOUS findings (intended escalation).
Architecture diagram
sequenceDiagram
    participant GC as GitHub Cron (Mon 05:17 UTC)
    participant WFA as Workflow: gomod-replace-audit
    participant GHAPI as gh CLI (GitHub API)
    participant GS as GitHub Search (code)
    participant GCONT as GitHub Contents API
    participant AWR as awk Extractor
    participant CLS as Classifier (allowlist)
    participant ISSUE as Rolling GitHub Issue (gomod-replace-audit)
    participant SLACK as Slack Security Webhook
    participant ART as Workflow Artifacts

    Note over WFA,ART: Weekly org-wide no-clone go.mod replace audit

    alt Scheduled trigger
        GC->>WFA: cron: ‘17 5 * * 1’
    else Manual trigger
        WFA->>WFA: workflow_dispatch
    end

    WFA->>GHAPI: Checkout .github repo (SHA-pinned actions/checkout)
    WFA->>GHAPI: Verify gh, jq, awk, base64

    Note over WFA,GS: Step: Enumerate org Go modules

    WFA->>GS: gh search code --owner allora-network 'filename:go.mod' --limit 200
    GS-->>WFA: JSON list of {repository.nameWithOwner, path}
    WFA->>WFA: Sort unique <repo,path> tuples → paths.tsv
    WFA->>WFA: Count discovered modules

    loop For each <repo,path> in paths.tsv
        Note over WFA,GCONT: Step: Fetch and audit each go.mod

        WFA->>GCONT: gh api repos/<repo>/contents/<path> --jq '.content'
        GCONT-->>WFA: base64-encoded go.mod content
        WFA->>WFA: base64 -d → gomod file

        alt Fetch failure (missing repo, moved default branch)
            WFA->>WFA: Record in fetch-failures.tsv
            WFA->>WFA: Skip to next module
        else Success
            WFA->>AWR: awk extract replace directives (single-line + block form)
            AWR-->>WFA: Parsed {repo, path, line_no, lhs, rhs, original_line}

            loop For each extracted replace directive
                WFA->>CLS: Classify RHS module path

                alt RHS starts with ./ or ../
                    WFA->>WFA: Class = legitimate-local-relative
                else RHS starts with /
                    WFA->>WFA: Class = investigate-absolute (IOC-grade)
                else RHS matches allowlist regex
                    WFA->>WFA: Class = legitimate-allowlisted-host
                else LHS module path == RHS module path
                    WFA->>WFA: Class = legitimate-version-pin
                else RHS not allowlisted AND LHS != RHS
                    WFA->>WFA: Class = SUSPICIOUS (IOC-grade)
                end

                WFA->>WFA: Append to audit.tsv with classification
            end
        end
    end

    Note over WFA,SLACK: Step: Generate summary & escalate if needed

    WFA->>WFA: Count total directives, SUSPICIOUS, fetch failures
    WFA->>WFA: Generate summary.md (markdown report)

    alt SUSPICIOUS or investigate-absolute count > 0
        WFA->>ISSUE: Update/create rolling issue (label: gomod-replace-audit)
        Note over WFA,SLACK: Append SUSPICIOUS findings to issue body
        WFA->>SLACK: Page Slack via SLACK_SECURITY_WEBHOOK
    else Fetch failures > 0 only
        WFA->>ISSUE: Update rolling issue with fetch failures only
        Note over WFA,SLACK: No Slack page - only issue update
    else Clean run (no findings)
        WFA->>WFA: No issue update, no Slack page
    end

    WFA->>ART: Upload audit.tsv and summary.md as workflow artifacts
    ART-->>WFA: Artifacts stored (downloadable from workflow run)
Loading

Reply with feedback, questions, or to request a fix.

Fix all with cubic | Re-trigger cubic

Comment thread .github/workflows/gomod-replace-audit.yml
Comment thread docs/security/gomod-replace-audit-2026-05-25.md Outdated
Addresses cubic-dev-ai review on PR #9:

- P1 (.github/workflows/gomod-replace-audit.yml:60): Add a 'Probe token
  scope' step that fails loudly when the workflow runs with only the
  default GITHUB_TOKEN against an org that has private repos. Without
  this, the audit could silently return a false-clean result because
  'gh search code' and 'gh api orgs/<org>/repos?type=private' both
  omit private repos under that token. Operators who consciously
  accept a public-only audit can ack via the ACCEPT_PUBLIC_ONLY_AUDIT
  org variable.

- P2 (docs/security/gomod-replace-audit-2026-05-25.md:30): Correct the
  zero-replace count from 8 to 9 to match the list of nine modules in
  the section below (12 scanned − 3 with replace directives = 9).
  Add a companion 'Modules with at least one replace directive' row
  for cross-checking.

Refs: https://linear.app/alloralabs/issue/DEVOP-617
Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name=".github/workflows/gomod-replace-audit.yml">

<violation number="1" location=".github/workflows/gomod-replace-audit.yml:123">
P1: This probe treats visibility of one private repo as full private coverage, so selected-repository PAT/App tokens can still leave some private repos unaudited while the workflow reports complete coverage.</violation>
</file>

Reply with feedback, questions, or to request a fix.

Fix all with cubic | Re-trigger cubic

Comment thread .github/workflows/gomod-replace-audit.yml
ce-correctness-reviewer + cubic-dev-ai found issues that would let the
audit silently false-clean or page Slack with malformed alerts. All
addressed:

- AWK extractor: skip full-line `//` comments and strip trailing
  `// ...` from real directives. Without this, a commented historical
  replace inside a `replace (...)` block (a common dependency-migration
  pattern, e.g. Cosmos SDK simapp) parsed as `lhs="// ...", rhs="..."`
  → lhs != rhs → SUSPICIOUS → false-positive Slack page on every weekly
  run. ce-correctness-reviewer P1 conf 85.
- Contents API: `--jq '.content // ""'` so JSON null (directory,
  submodule, oversized symlink) normalizes to empty string and is
  caught by the existing `[ ! -s ]` fetch-failure guard. Previously the
  literal "null" base64-decoded to 3 garbage bytes and silently
  produced an empty audit row. Plus a sanity check that decoded files
  start with `module ` before classifying. ce-correctness-reviewer P2.
- Slack-page step: gated on `success()` (not `always()`) and on the
  audit step producing non-empty outputs. Prevents a malformed
  incident-grade Slack page with no count and "summary unavailable"
  body when the audit step fails before emitting outputs.
  ce-correctness-reviewer P2.
- Final-summary step: branch on `steps.audit.outcome` so a failed
  audit doesn't render as "Audit clean — 0 replace directives".
- gh search code limit: warn loudly when results hit the 200 cap so a
  future Shai-Hulud-vector go.mod at position 201+ doesn't go silently
  unscanned. ce-correctness-reviewer residual.
- Token scope: pre-flight probe step that detects when the workflow
  has only `GITHUB_TOKEN` and the org has private repos, fails the
  audit loudly (or accepts a documented `ACCEPT_PUBLIC_ONLY_AUDIT`
  override) instead of silently false-cleaning every private Go
  module. cubic-dev-ai P1 conf 9/10.
- Audit report counts: fix off-by-one (8 zero-replace modules → 9; the
  bullet list always had 9 entries). cubic-dev-ai P2 conf 10/10.

Cross-pipeline follow-ups (shared regex file, shared awk extractor,
fixture-based parser tests, lookalike-bypass regex corpus) are tracked
in docs/security/gomod-replace-audit-2026-05-25.md under
"Cross-pipeline follow-ups" — they require touching the sibling
DEVOP-560 PR's files and stay out of scope here.

Refs: https://linear.app/alloralabs/issue/DEVOP-617
Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 2 files (changes from recent commits).

Tip: Review your code locally with the cubic CLI to iterate faster.

Fix all with cubic | Re-trigger cubic

Comment thread .github/workflows/gomod-replace-audit.yml Outdated
Comment thread docs/security/gomod-replace-audit-2026-05-25.md
srt0422 and others added 2 commits May 26, 2026 10:15
Cubic P1 (PRRT_kwDOLZ5Xss6EqQn3): the previous probe checked only the
first page of org private repos with per_page=1, so a selected-repository
PAT or GitHub App token granting access to ONE private repo would falsely
report 'full coverage' while leaving every other private repo unaudited.

Paginate the full private-repo list, compare visible vs total_private_repos,
and fail (or warn under ACCEPT_PUBLIC_ONLY_AUDIT) when the counts don't
match. Preserves the existing public-only / org-object-permission-denied
escape hatches.

Co-authored-by: Cursor <cursoragent@cursor.com>
…ty check

cubic-dev-ai re-review of ba9ff05 (P2 conf 9): the previous
`head -1 "$gomod" | grep -q '^module '` rejected any go.mod whose
first line is a comment or blank line, even though both are spec-valid.
The check would silently mark those files as fetch failures and skip
the audit entirely — the same false-clean failure mode the sanity
check was added to prevent.

Switch to `grep -qm1 '^[[:space:]]*module '` so the check accepts any
spec-valid go.mod and only flags responses that contain no `module`
directive anywhere (the actual binary/garbage case we want to catch).

Refs: https://linear.app/alloralabs/issue/DEVOP-617
Co-authored-by: Cursor <cursoragent@cursor.com>
@srt0422
Copy link
Copy Markdown
Author

srt0422 commented May 28, 2026

Needs-human follow-up #3 verification done.

Re-checked the close-out concern that `aeb5cc0` swept up the 'Cross-pipeline follow-ups (deferred)' section in `docs/security/gomod-replace-audit-2026-05-25.md`:

State now: the deletion was REVERSED by `be4718d` (2026-05-26 13:09 PT, after the close-out at 10:26 PT). The deferred-follow-ups section is back in the doc (lines ~102\u2013129 on the branch HEAD). No action needed in the PR itself.

Linear tracking: the three items in the deferred section had no dedicated Linear coverage prior to this run \u2014 they were tracked only in the at-risk doc. Filed three Low-priority follow-up tickets so the tracking is durable beyond the doc:

Item Ticket
Extract `GO_TRUSTED_HOSTS_RE` to a single committed source (so workflow + sweep + REFERENCE.md don't drift) DEVOP-632
Fixture-based parser tests for go.mod `replace` extraction (8+ edge cases + classification matrix) DEVOP-633
Regex lookalike-bypass corpus test for the `(/|$)` boundary anchor in `GO_TRUSTED_HOSTS_RE` DEVOP-634

(Note: the deferred section also lists a fourth item \u2014 'extract the awk replace-directive extractor to scripts/extract-go-replace.awk' \u2014 which I folded into DEVOP-632's acceptance criteria since extracting the shared regex and extracting the shared extractor are the same atomic 'consolidate the duplicated parsing pipeline' chunk of work. Happy to split it out if anyone disagrees.)

All three tickets link back to this PR and to the specific anchor line in the deferred section.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-human-review shai-hulud Shai-Hulud supply-chain defense work

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant