Skip to content

CI cleanup: ECT subsets, Node.js 24, container version tracking#18

Merged
cenamiller merged 48 commits intomasterfrom
feature-ci-cleanup
Mar 27, 2026
Merged

CI cleanup: ECT subsets, Node.js 24, container version tracking#18
cenamiller merged 48 commits intomasterfrom
feature-ci-cleanup

Conversation

@cenamiller
Copy link
Copy Markdown
Collaborator

@cenamiller cenamiller commented Mar 27, 2026

Summary

Comprehensive CI cleanup: consolidates duplicate workflows into reusable templates, switches validation from bit-for-bit log comparison to Ensemble Consistency Testing (ECT), and resolves Node.js 20 deprecation warnings.

Architecture changes

  • 8 subset workflows (6 CPU + 2 GPU) replace the old monolithic workflows. Each is a thin caller to _test-compiler.yml or _test-gpu.yml.
  • ECT validation: all subsets build in double precision, run 3 perturbed ensemble members in parallel (4 MPI ranks), and validate with PyCECT.
  • MPICH subsets run automatically on push/PR. OpenMPI and GPU subsets are dispatch-only.
  • ci-config.env is the single source of truth for container images, compiler mappings, and MPI flags.

Notable fixes

  • Node.js 24 migration (fixes Node.js deprecation warnings #15): checkout v5, upload-artifact v6, download-artifact v7, setup-python v6, cache v5, codecov v6.
  • Intel IFX 2025.3 fpp regression: pinned to hpcdev 25.09 container (IFX 2025.2.1) via per-compiler image override in ci-config.env.
  • Third-party action removal: replaced geekyeggo/delete-artifact with gh api calls.
  • Python dependency resilience: run-perturb-mpas activates conda base env, then uses import check → pip → conda fallback chain.
  • build-mpas installs cpp if missing (needed by older Leap containers).

Deleted workflows

  • test-ga-nogpu.yml, test-cirrus-nvhpc.yml, test-gcc.yml, test-nvhpc.yml, test-intel.yml, test-gpu.yml, test-hpcdev-containers.yml, debug-hpcdev-mpi.yml, fortran-linting.yml

Documentation

  • README badge table shows container image tags and MPI rank counts.
  • copilot-instructions.md added with MPAS-specific Fortran coding standards.
  • ci-config.env comments cleaned up with Docker Hub link for tag reference.

Test results

Test Result
GNU+MPICH (CPU)
GNU+OpenMPI (CPU)
Intel+MPICH (CPU)
Intel+OpenMPI (CPU)
NVHPC+MPICH (CPU)
NVHPC+OpenMPI (CPU) ❌ SIGABRT (exit 134) — MPI runtime issue, dispatch-only
Unit Tests

… via resolve-container action

Made-with: Cursor
- Phase 1: Consolidate test-gcc/nvhpc/intel into reusable _test-compiler.yml
  (3 x 145-line workflows to 3 x 18-line wrappers + 1 shared workflow)
- Phase 4: Extract cross-repo checkout into checkout-mpas-source composite action
  (replaces 6 instances of the 3-step checkout+overlay pattern across 5 workflows)
- Phase 5: Remove dead interrogate_env.sh reference from build-mpas, unused
  variables from validate-logs, gate run-perturb-mpas diagnostics behind
  verbose input
- Phase 7: Gitignore fortitude-results.log, remove it from tracking
- Phase 8: Rename config.env RESOLUTION to DATA_RESOLUTION to eliminate
  variable collision workaround in run-mpas

Net: ~490 fewer lines of YAML.
Made-with: Cursor
checkout-mpas-source cannot be used as the first workflow step because
local actions need the repo already checked out on the runner. Restoring
the inline 3-step checkout+overlay pattern in all 5 workflows.

Made-with: Cursor
The hpcdev container's MPI libraries have Fortran interface
incompatibilities with Intel/OneAPI (argument count mismatches
in mpas_dmpar.F) that persist even with MPAS_MPI_F08=0.

Add per-compiler container image override support to
resolve-container and ci-config.env so Intel uses the
cisldev image (matching original master behavior) while
gcc/nvhpc continue to use hpcdev.

Made-with: Cursor
IFX 2025.3 (hpcdev 26.02) changed how nested macros inside
function-like macro arguments are expanded, breaking the
COMMA/#define DMPAR_DEBUG_WRITE(M) pattern used throughout
mpas_dmpar.F. IFX 2025.2 (cisldev) is unaffected.

The fix rewrites DMPAR_DEBUG_WRITE as a C99 variadic macro
via sed before compilation, analogous to the existing NVHPC
target-architecture sed workaround.

Also reverts the per-compiler container override (cisldev for
Intel) since the real issue was the preprocessor, not MPI
libraries.

Made-with: Cursor
The _test-compiler reusable workflow now builds in double precision and
validates using the Ensemble Consistency Test (3 perturbed members +
PyCECT) instead of bit-for-bit log comparison against a reference.

The validate-logs action is preserved for use in other workflows.

Made-with: Cursor
Each MPI variant now runs 3 members as separate parallel jobs instead of
sequentially in one job. The validate step collects all member artifacts
via pattern + merge-multiple. Wall time drops from ~45min to ~17min.

Made-with: Cursor
Remove the third-party action dependency. Artifact cleanup now uses the
gh CLI to call the GitHub REST API directly, matching the same name
patterns as before.

Made-with: Cursor
Incorporate GPU/OpenACC patterns, commit message format, branch naming,
module visibility (private/public), and anti-patterns from the MPAS-A
contributor's guide at contributors-mpas-a.readthedocs.io.

Made-with: Cursor
Copilot AI review requested due to automatic review settings March 27, 2026 16:38
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR overhauls the repository’s CI by centralizing configuration (containers, compiler→make-target mapping, MPI runtime flags) and refactoring workflows into reusable building blocks, while updating subset testing to use PyCECT (ECT) for CPU compiler/MPI subsets and introducing per-compiler+MPI status badges.

Changes:

  • Added .github/ci-config.env plus resolve-container to make container/image + compiler/MPI mapping a single source of truth.
  • Replaced monolithic subset workflows with reusable _test-compiler.yml / _test-gpu.yml and small caller workflows per compiler+MPI.
  • Updated multiple workflows/actions to use the centralized config (incl. MPI runtime flags) and removed third-party artifact deletion action in favor of gh api.

Reviewed changes

Copilot reviewed 33 out of 35 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
README.md Updates CI badge table to per-compiler+MPI badges.
.gitignore Broadens tarball ignore rule; adds fortitude output ignore.
.github/workflows/test-nvhpc.yml Removes old NVHPC subset workflow.
.github/workflows/test-nvhpc-openmpi.yml New NVHPC+OpenMPI subset caller workflow.
.github/workflows/test-nvhpc-mpich.yml New NVHPC+MPICH subset caller workflow.
.github/workflows/test-intel.yml Removes old Intel subset workflow.
.github/workflows/test-intel-openmpi.yml New Intel+OpenMPI subset caller workflow.
.github/workflows/test-intel-mpich.yml New Intel+MPICH subset caller workflow.
.github/workflows/test-gpu.yml Removes old GPU subset workflow.
.github/workflows/test-gpu-openmpi.yml New GPU OpenMPI caller workflow (workflow_dispatch).
.github/workflows/test-gpu-mpich.yml New GPU MPICH caller workflow (workflow_dispatch).
.github/workflows/test-gcc.yml Removes old GCC subset workflow.
.github/workflows/test-gcc-openmpi.yml New GNU+OpenMPI subset caller workflow.
.github/workflows/test-gcc-mpich.yml New GNU+MPICH subset caller workflow.
.github/workflows/test-ga-nogpu.yml Uses resolved container config; removes mpich3; replaces artifact cleanup with gh api.
.github/workflows/test-cirrus-nvhpc.yml Uses resolved container config; removes mpich3; replaces artifact cleanup with gh api.
.github/workflows/fortran-linting.yml Removes Fortran linting workflow.
.github/workflows/ect-test.yml Uses resolved container config; switches compiler input to gcc; replaces artifact cleanup with gh api.
.github/workflows/ect-ensemble-gen.yml Uses resolved container config; removes mpich3 option; fixes resolution env var name; replaces artifact cleanup with gh api.
.github/workflows/coverage.yml Uses resolved container config and removes mpich3 reference.
.github/workflows/_test-gpu.yml Adds reusable GPU-vs-CPU workflow used by GPU subset callers.
.github/workflows/_test-compiler.yml Adds reusable ECT-based compiler+MPI subset workflow for CPU.
.github/test-cases/ect-120km/config.env Renames RESOLUTIONDATA_RESOLUTION; updates summary prefix var usage.
.github/test-cases/240km/config.env Renames RESOLUTIONDATA_RESOLUTION.
.github/test-cases/120km/config.env Renames RESOLUTIONDATA_RESOLUTION.
.github/copilot-instructions.md Adds MPAS-specific Fortran coding standards for AI assistants.
.github/ci-config.env Adds centralized CI configuration for containers, make targets, MPI flags, and workarounds.
.github/actions/validate-logs/action.yml Simplifies log validation action (removes extra listing/arg-building steps).
.github/actions/run-perturb-mpas/action.yml Adds verbose toggle; sources ci-config.env for MPI flags/env vars.
.github/actions/run-mpas/action.yml Sources ci-config.env for MPI flags/env vars; removes old RESOLUTION clobber workaround.
.github/actions/resolve-container/action.yml Adds composite action to resolve container images + mapping JSON from ci-config.env.
.github/actions/download-testdata/action.yml Uses DATA_RESOLUTION to select archive name.
.github/actions/build-mpas/action.yml Reads ci-config.env for make-target mapping and compiler/MPI workarounds; applies oneAPI fpp workaround.
.github/AGENT_GUIDE.md Updates documentation to reflect DATA_RESOLUTION naming and config/env behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .github/workflows/_test-gpu.yml Outdated
Comment thread .github/ci-config.env Outdated
Comment thread README.md Outdated
Comment thread .github/actions/resolve-container/action.yml Outdated
- actions/checkout v4 → v5
- actions/upload-artifact v4 → v6
- actions/download-artifact v4 → v6
- actions/setup-python v5 → v6
- actions/cache v4 → v5
- codecov/codecov-action v4 → v6

Eliminates Node.js 20 deprecation warnings that will become errors
after June 2, 2026.

Made-with: Cursor
- Bump download-artifact v6 → v7 (v6 still uses Node.js 20 runtime)
- Source /container/config_env.sh before installing Python deps so
  miniforge Python/conda is on PATH in older containers (e.g. Leap 25.09)

Made-with: Cursor
Each composite action step runs in its own shell, so conda/pip
packages installed in the "Install Python dependencies" step were
invisible to the "Run perturbed members" step. Merging them into
a single step ensures the environment (PATH, conda env) persists.

Made-with: Cursor
Check if netCDF4 and numpy are already importable (many containers
ship them via miniforge) before attempting pip or conda install.
Most resilient to container environment changes.

Made-with: Cursor
The root cause: conda install put packages into miniforge's Python
but python3 still resolved to the system Python (which has no
netCDF4). Activating the conda base environment early ensures
python3 points to the same Python where packages are installed.

Made-with: Cursor
MPICH is the default on Derecho, so only MPICH subsets run
automatically on push/PR. OpenMPI subsets remain available
via manual workflow_dispatch.

Made-with: Cursor
Badge table now shows the container image tag and MPI rank count
for each test. Container tags encode compiler + library versions,
so updating ci-config.env is the only change needed when images
are bumped. Added note about Intel 25.09 pin and links to
ci-config.env and Docker Hub.

Made-with: Cursor
- Remove CONTAINER_COMPILER entries for nvhpc and oneapi (they mapped
  to themselves and served no purpose)
- Add Docker Hub link so maintainers can check image tag names
- Add concrete example explaining why gcc=gcc14 mapping is needed
- Trim casual language and redundant comments throughout

Made-with: Cursor
@cenamiller cenamiller changed the title CI cleanup: DRY refactoring, ECT subsets, per-compiler+MPI badges CI cleanup: ECT subsets, Node.js 24, container version tracking Mar 27, 2026
- Update branch structure (feature-ci-cleanup is active, old branches removed)
- Replace deleted workflow descriptions with new subset/reusable architecture
- Update container section from cisldev to hpcdev
- Remove outdated MPI compatibility matrix and development history
- Add ci-config.env, resolve-container, and checkout-mpas-source docs
- Trim verbose sections while keeping actionable reference material

Made-with: Cursor
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 35 out of 37 changed files in this pull request and generated 8 comments.

Comments suppressed due to low confidence (2)

README.md:1

  • The README’s container strings (e.g., hpcdev:almalinux9-gcc14-mpich-26.02) don’t match the actual fully-qualified images used by workflows (docker.io/ncarcisl/hpcdev-x86_64:... as assembled from .github/ci-config.env). This can confuse users trying to reproduce CI locally. Consider updating the table to either (a) show the exact resolved image names (including docker.io/ncarcisl/hpcdev-x86_64:), or (b) explicitly call out that the table shows only the tag portion and that the full image is derived from ci-config.env.
    README.md:1
  • The README’s container strings (e.g., hpcdev:almalinux9-gcc14-mpich-26.02) don’t match the actual fully-qualified images used by workflows (docker.io/ncarcisl/hpcdev-x86_64:... as assembled from .github/ci-config.env). This can confuse users trying to reproduce CI locally. Consider updating the table to either (a) show the exact resolved image names (including docker.io/ncarcisl/hpcdev-x86_64:), or (b) explicitly call out that the table shows only the tag portion and that the full image is derived from ci-config.env.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .github/workflows/ect-test.yml Outdated
Comment thread .github/workflows/ect-ensemble-gen.yml Outdated
Comment thread .github/workflows/ect-ensemble-gen.yml Outdated
Comment thread .github/workflows/_test-compiler.yml Outdated
Comment thread .github/workflows/_test-gpu.yml Outdated
Comment thread .github/actions/validate-logs/action.yml
Comment thread .github/actions/run-mpas/action.yml Outdated
Comment thread .github/actions/run-perturb-mpas/action.yml
- Append || true to artifact cleanup pipelines so empty results
  don't fail the step under bash pipefail (4 workflow files)
- Add default fallback for OPENMPI_RUN_FLAGS in run-mpas and
  run-perturb-mpas in case ci-config.env isn't sourced

Made-with: Cursor
@cenamiller cenamiller merged commit 47e2cb3 into master Mar 27, 2026
24 checks passed
@cenamiller cenamiller deleted the feature-ci-cleanup branch March 27, 2026 19:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Node.js deprecation warnings

2 participants