Skip to content

Optimize CI for wolfProvider#400

Open
aidangarske wants to merge 37 commits into
wolfSSL:masterfrom
aidangarske:ci-draft-pause
Open

Optimize CI for wolfProvider#400
aidangarske wants to merge 37 commits into
wolfSSL:masterfrom
aidangarske:ci-draft-pause

Conversation

@aidangarske
Copy link
Copy Markdown
Member

@aidangarske aidangarske commented May 23, 2026

Description

  • trigger OSP projects to run nightly and send slack message if fail
  • dynamically get latest wolfssl version and openssl version
  • (All OSP where getting tested by 3.0.20 from debian:bookworm not 3.5.4)
  • add ubsan and asan for WP specifically
  • Add smoke tests for draft
  • Only test on status "open" only smoke on draft
  • no apt-get use ghcr container
  • backward comapt for 5.8.4
  • 5.9.1 support
  • Nightly runs all tested and passing with new patches
  • Auto retry system for flaky nightlies in place and tested

related PR's need to go in first in this order then this one

  1. wolfProvider: 5.9.1 FIPS patches (krb5, hostap, stunnel, libssh2, curl) osp#340
  2. https://github.com/wolfSSL/testing/pull/962
  3. https://github.com/wolfSSL/testing/pull/958

slack notifications system using claude and simple expected regex to retry flacky jobs
image

Copilot AI review requested due to automatic review settings May 23, 2026 06:27
@aidangarske aidangarske marked this pull request as draft May 23, 2026 06:30
@aidangarske aidangarske changed the title ci: pause non-smoke workflows on draft PRs, add smoke preflight Optimize CI for wolfProvider May 23, 2026
@aidangarske aidangarske reopened this May 23, 2026
@aidangarske aidangarske self-assigned this May 23, 2026
@aidangarske aidangarske requested review from Copilot and dgarske and removed request for Copilot May 23, 2026 06:43
@aidangarske aidangarske marked this pull request as ready for review May 25, 2026 19:25

This comment was marked as resolved.

aidangarske added a commit to aidangarske/wolfProvider that referenced this pull request May 25, 2026
…ew fix)

Was: every workflow pulled ghcr.io/wolfssl/wolfprovider-test-deps:bookworm,
which doesn't exist until upstream master runs the publish workflow.
Bootstrap chicken-and-egg.

Now: publish-test-deps-image.yml fires on any branch push (and PRs)
and pushes to ghcr.io/<repo-owner>/wolfprovider-test-deps:bookworm.
Consumer workflows read from the PR head's owner when on a PR, else
the running repo's owner. Result: a fork PR publishes to the fork's
ghcr namespace and pulls from it; master pushes publish to the org's
ghcr namespace and pulls from it.

Also fixes copilot review feedback from
wolfSSL#400 (review)

- Phase B log filename renames broke check-workflow-result.sh's
  hardcoded log paths (curl-test.log, openvpn-test.log, sssd-test.log,
  net-snmp-test.log, nginx-test.log, openssh-test.log, tcpdump-test.log,
  liboauth2-test.log, stunnel-test.log) plus in-step greps in cjose,
  libcryptsetup, libfido2, libhashkit2, libtss2, opensc, python3-ntp,
  qt5network5, tnftp, tpm2-tools. Reverted log names back to
  <app>-test.log; second mode overwrites first.
- libtss2.yml: fix `if $(grep -q ...)` (invalid shell -- command
  substitution of grep used as the if condition expanded to an empty
  command). Use `if grep -q ...; then`.
- opensc.yml: fix `TEST_RESULT=$(((grep ...) && echo 0 || echo 1))`
  (arithmetic expansion `(( ))` can't contain shell commands). Hoist
  to a check_opensc_log() function called from both modes.
- stunnel.yml: `grep -c "failed: 0"` returns 1 on success, but
  check-workflow-result.sh expects TEST_RESULT==0 for pass.
  Use `if grep -q ...; then TEST_RESULT=0; else TEST_RESULT=1; fi`.
  Also mirror tests/logs/results.log to stunnel-test.log so the
  force-fail check finds the expected file.
- hostap.yml: drop continue-on-error from the normal-mode test step.
  Without it the step's exit code was swallowed and normal-mode test
  failures didn't fail the job.

One-time setup: after this lands, the owner of each fork that opens a
PR has to make their ghcr.io/<owner>/wolfprovider-test-deps package
public (GitHub UI: Packages -> Package settings -> Change visibility).
GitHub's Actions runners can only pull public packages from another
namespace.
aidangarske added a commit to aidangarske/wolfProvider that referenced this pull request May 25, 2026
…vate)

Earlier commits tried to make fork CI work by:
  - having publish-test-deps-image.yml push to a per-owner ghcr namespace
    (ghcr.io/<owner>/wolfprovider-test-deps)
  - having consumer workflows pull from the PR head's owner
  - auto-PATCHing the test-deps package to visibility=public
  - dropping the `github.repository == 'wolfSSL/wolfProvider'` guard on
    the wolfprov-debs ORAS pull in build-wolfprovider.yml

That path only works if the packages can be public, which they can't
(some of the .debs contain commercially-licensed bits). Revert to the
canonical-only behavior:

publish-test-deps-image.yml
  - fires only on push to master/main (was '**')
  - guards the publish on github.repository == 'wolfSSL/wolfProvider'
  - drops the per-owner namespace; always pushes to
    ghcr.io/wolfssl/wolfprovider-test-deps
  - removes the Mark-package-public step

build-wolfprovider.yml
  - restores the github.repository == 'wolfSSL/wolfProvider' guard on
    the Login, Download .debs, and Download WIC steps

39 consumer workflows
  - container.image reverted from the per-owner expression back to the
    literal ghcr.io/wolfssl/wolfprovider-test-deps:bookworm

Practical effect: PR CI and nightly only run on the canonical repo
(or once PR wolfSSL#400 merges, on wolfSSL/wolfProvider's runners). Fork
pushes will skip the wolfprov-deb pull and any container-using job
will fail loud at the image pull -- which is the right signal: those
runs need to happen on the canonical repo.
aidangarske added a commit to aidangarske/wolfProvider that referenced this pull request May 25, 2026
…idation)

Add pull_request trigger to nightly-osp.yml so PR wolfSSL#400's reviewers
can see the dispatcher actually fan all 41 reusable workflows out
and the notify job hit Slack.

Marked temporary in the file header -- revert this trigger before
merging if you don't want the full nightly job set firing on every
PR. (For everyday CI, scheduled + workflow_dispatch is the intended
shape.)

Note: PR runs from forks will still hit the private-package issue
for the wolfprov-debs pull (the wolfSSL/wolfProvider repo guard
short-circuits the ORAS step on non-canonical repos). The plumbing
itself -- dispatch, discover-versions, notify, Slack -- runs
regardless and is what this PR-trigger lets you verify end-to-end.
aidangarske added a commit to aidangarske/wolfProvider that referenced this pull request May 25, 2026
Adds aidangarske/wolfProvider to the publish workflow's repository
allowlist so PR wolfSSL#400's working branch can bootstrap a test-deps
image on the fork's ghcr namespace. Pushed image lands at
ghcr.io/aidangarske/wolfprovider-test-deps:bookworm.

Also adds 'ci-draft-pause' to the branches list (alongside master/
main) so a push to that branch triggers the workflow without needing
a separate workflow_dispatch.

Consumer workflows continue to pull from ghcr.io/wolfssl/... so this
fork-side push is purely for the fork owner to verify the
build/push pipeline works end to end before PR merges. After merge,
the canonical wolfSSL/wolfProvider master push will publish the
authoritative image and consumers will find it.

Note: the 'ci-draft-pause' branch entry is TEMPORARY for PR wolfSSL#400.
Drop it (and remove aidangarske from the allowlist if desired)
once the PR merges.
dgarske pushed a commit that referenced this pull request May 26, 2026
)

Bootstrap PR: introduces the test-deps container image that PR #400's
nightly OSP workflows consume. This is a minimal subset of PR #400
intended to merge first, so the publish workflow fires once on master
and the test-deps image lands at ghcr.io/wolfssl/wolfprovider-test-deps
:bookworm before the rest of PR #400 merges. Without this, PR #400's
OSP container jobs all fail with "manifest unknown" because the image
they pull doesn't exist anywhere yet.

Two files only:
  docker/wolfprovider-test-deps/Dockerfile
    Single Debian-bookworm image with every apt dep that the OSP
    integration tests used to install at job time. One apt-get update
    at build time, zero at job time -- eliminates Debian mirror flake.

  .github/workflows/publish-test-deps-image.yml
    Builds the Dockerfile and pushes to
    ghcr.io/wolfssl/wolfprovider-test-deps:bookworm on push to
    master/main (path-filtered to docker/wolfprovider-test-deps/**)
    or workflow_dispatch. Guarded with
    github.repository == 'wolfSSL/wolfProvider' so forks don't try
    to push to wolfSSL's namespace.

The OSP workflows themselves, the discover-versions resolver, the
ASan/UBSan workflow, and all the matrix/force-fail consolidation
land via PR #400 once this is in place.
dgarske added a commit that referenced this pull request May 26, 2026
ci: bootstrap test-deps Docker image (prep for PR #400)
aidangarske added a commit to aidangarske/wolfProvider that referenced this pull request May 26, 2026
PR wolfSSL#402 published ghcr.io/wolfssl/wolfprovider-test-deps:bookworm.
This empty commit bumps the head SHA so PR wolfSSL#400's checks rerun
against the now-existing image.
@aidangarske aidangarske force-pushed the ci-draft-pause branch 3 times, most recently from 5ce6df6 to 91f2549 Compare May 27, 2026 04:50
@aidangarske aidangarske requested review from ColtonWilley and padelsbach and removed request for dgarske May 27, 2026 04:54
@aidangarske aidangarske force-pushed the ci-draft-pause branch 2 times, most recently from 82d537b to e5226fb Compare May 27, 2026 05:21
Update resolve-osp-patch.sh to the final naming convention:
  - universal name (no -wolfssl-X.Y.Z- infix) = LATEST content (current
    wolfSSL master / current latest stable)
  - -wolfssl-X.Y.Z- infix = pinned snapshot for an older wolfSSL line
Resolution: v5.8.X tries -wolfssl-5.8.4- then universal; v5.9.X tries
-wolfssl-5.9.1- then universal; master uses universal directly.

Convert every PR-time and nightly OSP workflow that previously did
'patch -p1 < .../osp/wolfProvider/<proj>/<proj>-<ref>-wolfprov[-fips].patch'
to call the helper instead:

  PATCH=$($GITHUB_WORKSPACE/scripts/resolve-osp-patch.sh \
            $GITHUB_WORKSPACE/osp <proj> <projver> ${{ matrix.wolfssl_ref }} \
            [${{ matrix.fips_ref == 'FIPS' && '--fips' || '' }}])
  patch -p1 < "$PATCH"

26 workflows updated. krb5, socat, curl strip the leading <project>-
from their matrix ref so the helper builds the right stem. libssh2 and
openldap append -debian to the projver because their patches use that
variant suffix in the OSP repo.
Adds optional 'wolfssl_refs_json' input (JSON array string) to every
PR-time and nightly OSP workflow. When set, it overrides the matrix's
wolfssl_ref dimension; when empty (default), behavior is unchanged
(matrix uses discover_versions output).

Rewrites nightly-osp.yml as two sequential waves:
  Wave 1: every OSP project pinned to v5.9.1-stable
  wave1-done: fan-in job with 'if: always()'
  Wave 2: every OSP project pinned to v5.8.4-stable, needs wave1-done

Wave 2 only starts after every Wave 1 job has finished. 'if: always()'
on Wave 2 jobs means a single Wave 1 flake doesn't skip the 5.8.4
coverage. The Slack notify job lists both wave 1 and wave 2 job
results.

static-analysis and multi-compiler stay outside the waves
(static-analysis isn't wolfssl-version sensitive; multi-compiler
iterates its own wolfssl matrix and now includes representative
v5.8.4-stable rows for gcc-12 and clang-14).
Wave 1 no longer hardcodes wolfssl_refs_json. Each child workflow's
default matrix already picks up latest discovered wolfssl stable from
_discover-versions.yml, so Wave 1 auto-tracks 5.9.1 today and 5.9.2
(or whatever) tomorrow with no edits here. Wave 2 stays pinned to
v5.8.4-stable because that's the explicit back-compat line.
Brings the v5.8.4 backwards-compat plan execution into PR wolfSSL#400:
  - scripts/resolve-osp-patch.sh for wolfssl-version-aware patch lookup
  - 26 OSP workflows routed through the helper
  - wolfssl_refs_json input on all 42 nightly OSP workflows
  - nightly-osp.yml split into Wave 1 (dynamic latest stable) and
    Wave 2 (v5.8.4-stable pinned) with wave1-done fan-in
  - nightly-multi-compiler.yml gains representative v5.8.4-stable rows

Depends on wolfssl/osp PR wolfSSL#340 + follow-up commit that adds
-wolfssl-5.8.4- snapshot patches for libssh2, krb5, stunnel.
OSP PR wolfSSL#340 review removed the duplicate stunnel-WPFF-5.67-wolfprov-fips.patch
(it was identical to the non-FIPS patch). stunnel.yml no longer passes
--fips so the resolver picks the single stunnel-WPFF-5.67-wolfprov.patch
for both FIPS and non-FIPS rows.
…xists

Every workflow (stunnel included) passes --fips uniformly; the resolver
decides. If a project ships no FIPS-specific patch it uses the common
non-FIPS one, and adding a FIPS patch in OSP later is picked up
automatically with no workflow change.
OSP patch names are now one convention (<project>-<projver>-wolfprov
[-fips].patch), so the resolver drops the -FIPS- infix handling and the
opensc -wolfprovider special case. FIPS resolution is bidirectional:
--fips prefers -wolfprov-fips.patch then falls back to -wolfprov.patch,
and non-FIPS prefers -wolfprov.patch then -wolfprov-fips.patch, so a
project that ships only one variant works for both modes.

Every OSP workflow that has a fips_ref matrix dimension now passes the
same '${{ matrix.fips_ref == 'FIPS' && '--fips' || '' }}' to the
resolver - no more mix of hardcoded --fips (grpc, python3-ntp, tcpdump)
and silently-omitted --fips (liboauth2 had a FIPS patch it never used,
plus curl, libnice, opensc, openvpn, openldap, libssh2, libcryptsetup,
qt5network5, socat, x11vnc, openssh). libtss2 and sssd are unchanged -
they have no FIPS dimension.
Previously always pulled the rolling :fips/:nonfips tag regardless of
the wolfssl_ref matrix value, so every job tested the same deb. Now a
vX.Y.Z-stable ref pulls debs:fips-<ref>/nonfips-<ref> (the
version-pinned tag debian-export publishes), so nightly Wave 1
(latest) and Wave 2 (v5.8.4-stable) actually exercise different
wolfSSL versions. Non-stable refs (master) fall back to rolling.
…P patches

Temporary test scaffolding so the nightly OSP CI run can exercise the
renamed/snapshot patches from osp PR wolfSSL#340 before they merge to osp
master. REVERT before merging PR wolfSSL#400 - the OSP checkout must go back
to wolfssl/osp (master) once osp wolfSSL#340 lands.

All 26 OSP workflow checkouts repointed from wolfssl/osp to
aidangarske/osp ref 5.9.1-wolfprov-patches.
Run the full Wave 1 (5.9.1) + Wave 2 (5.8.4) OSP suite in CI on push
to the test branch instead of waiting for the 6 AM schedule. Only
meaningful on canonical wolfSSL/wolfProvider where the private deb
pull works. REVERT before merge - keep only schedule + workflow_dispatch.
GitHub Actions expressions require single quotes for string literals;
the matrix override used double quotes (!= ""), which fails workflow
validation - nightly-osp startup_failed with 0 jobs. Replace != ""
with != '' across all 40 OSP workflow matrix lines.
Retry-outcome classification: rerun every non-passing job once; cleared =
flake, failed-twice = real. Claude validates the survivors and writes the
short root-cause notes; the script renders one clean Slack health report.

TEMP push trigger reports against finished run 26594154670 for testing.
…ing suite

TEMP paths-ignore so report-file-only pushes don't kick the nightly suite.
Drop header emoji and the 'AI:' note prefix. Exclude infra-setup flakes from
the AI input so the headline can't conflate them with real failures. Color by
pass-rate (mostly-green run with a few real failures is a warning, not red).
Tighten the prompt for specific per-job notes grounded only in that job's log.
Move triage script to .github/scripts/ (one workflow yml). Reproducing test
failures are real with a P0-P3 severity, never demoted to flake (flake = infra
only). Reconcile the counts, add per-suite tiers, failing-job log links, and
AI symptom/cause/next lines.
Named severity (Critical/High/Medium/Low) with a meter tally. Restore the
colored breakdown line, reconciled by jobs. Add a pass-rate sparkline vs prior
nightlies. Recovered-on-retry jobs count as passed, never as flakes.
Its matrix reads needs.discover_versions.outputs.* but the job never declared
the dependency, so the matrix expanded empty — no build jobs, test_xmlsec
skipped, workflow failed on both wolfSSL versions. Matches every other OSP
workflow.
…l it

As a reusable workflow its concurrency group keyed off the caller
(github.workflow = 'Nightly OSP Suite'), so a new nightly run cancelled the
prior run's static analysis even though nightly-osp uses cancel-in-progress:
false. Add github.run_id to the group.
Remove all temporary testing scaffolding so everything points at the canonical
repos and runs on the nightly schedule:
- OSP checkout back to wolfssl/osp (drop the fork-branch override)
- nightly-osp: schedule + workflow_dispatch only (drop the ci-draft-pause push
  trigger and paths-ignore)
- nightly-osp: drop the old flat Slack notify; the new osp-report
  (on: workflow_run) handles notification + auto-retry
- osp-report: workflow_run + workflow_dispatch only (drop push trigger and the
  hardcoded fixture run id)
aidangarske added a commit to aidangarske/wolfProvider that referenced this pull request May 29, 2026
…or testing

pull_request_target so the ghcr push has canonical scope; OWNER/MEMBER gate +
PR-head checkout, mirroring publish-test-deps-image. Lets PR wolfSSL#400 populate the
dep-cache so the --build-ci consumers can be validated end-to-end.
Empty commit to re-fire PR wolfSSL#400 CI so we can observe:
  - PRB launches immediately on this push, in parallel with smoke
    test on GitHub Actions (preflight no longer polls smoke).
  - On the Jenkins agents that pick up the matrix, the test-wp-cs
    step runs via prb-cached-test.sh: first time builds + caches,
    later times pull from $HOME/.cache/wolfprov-prb-deps/.
Previous PRB run failed because the Jenkinsfile referenced
${WORKSPACE}/jenkins-scripts/stable/PRB/prb-cached-test.sh, but the
setup stage's cleanWs() wipes the testing-repo checkout before
stashing only wolfProvider/, so the helper wasn't on the parallel
agents. testing PR #958 b89d0b49 inlines the cache logic directly
into the test-wp-cs sh block.
- _discover-versions.yml: extends wolfssl_latest_ref_array, so Simple,
  Cmdline, Sanitizers and SEED-SRC pick it up automatically.
- smoke-test.yml: third smoke build (5.8.4 + openssl-latest).
- multi-compiler.yml: gcc-12 + v5.8.4-stable + master openssl, alongside
  the existing v5.8.0-stable entry.
@aidangarske
Copy link
Copy Markdown
Member Author

Jenkins retest this please

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants