CI cleanup: ECT subsets, Node.js 24, container version tracking#18
CI cleanup: ECT subsets, Node.js 24, container version tracking#18cenamiller merged 48 commits intomasterfrom
Conversation
Made-with: Cursor
Made-with: Cursor
…tralized config Made-with: Cursor
… via resolve-container action Made-with: Cursor
Made-with: Cursor
Made-with: Cursor
- Phase 1: Consolidate test-gcc/nvhpc/intel into reusable _test-compiler.yml (3 x 145-line workflows to 3 x 18-line wrappers + 1 shared workflow) - Phase 4: Extract cross-repo checkout into checkout-mpas-source composite action (replaces 6 instances of the 3-step checkout+overlay pattern across 5 workflows) - Phase 5: Remove dead interrogate_env.sh reference from build-mpas, unused variables from validate-logs, gate run-perturb-mpas diagnostics behind verbose input - Phase 7: Gitignore fortitude-results.log, remove it from tracking - Phase 8: Rename config.env RESOLUTION to DATA_RESOLUTION to eliminate variable collision workaround in run-mpas Net: ~490 fewer lines of YAML. Made-with: Cursor
checkout-mpas-source cannot be used as the first workflow step because local actions need the repo already checked out on the runner. Restoring the inline 3-step checkout+overlay pattern in all 5 workflows. Made-with: Cursor
Made-with: Cursor
Made-with: Cursor
The hpcdev container's MPI libraries have Fortran interface incompatibilities with Intel/OneAPI (argument count mismatches in mpas_dmpar.F) that persist even with MPAS_MPI_F08=0. Add per-compiler container image override support to resolve-container and ci-config.env so Intel uses the cisldev image (matching original master behavior) while gcc/nvhpc continue to use hpcdev. Made-with: Cursor
Made-with: Cursor
Made-with: Cursor
… and cisldev Made-with: Cursor
Made-with: Cursor
IFX 2025.3 (hpcdev 26.02) changed how nested macros inside function-like macro arguments are expanded, breaking the COMMA/#define DMPAR_DEBUG_WRITE(M) pattern used throughout mpas_dmpar.F. IFX 2025.2 (cisldev) is unaffected. The fix rewrites DMPAR_DEBUG_WRITE as a C99 variadic macro via sed before compilation, analogous to the existing NVHPC target-architecture sed workaround. Also reverts the per-compiler container override (cisldev for Intel) since the real issue was the preprocessor, not MPI libraries. Made-with: Cursor
The _test-compiler reusable workflow now builds in double precision and validates using the Ensemble Consistency Test (3 perturbed members + PyCECT) instead of bit-for-bit log comparison against a reference. The validate-logs action is preserved for use in other workflows. Made-with: Cursor
Each MPI variant now runs 3 members as separate parallel jobs instead of sequentially in one job. The validate step collects all member artifacts via pattern + merge-multiple. Wall time drops from ~45min to ~17min. Made-with: Cursor
Remove the third-party action dependency. Artifact cleanup now uses the gh CLI to call the GitHub REST API directly, matching the same name patterns as before. Made-with: Cursor
Incorporate GPU/OpenACC patterns, commit message format, branch naming, module visibility (private/public), and anti-patterns from the MPAS-A contributor's guide at contributors-mpas-a.readthedocs.io. Made-with: Cursor
…-bit Made-with: Cursor
Made-with: Cursor
Made-with: Cursor
Made-with: Cursor
Made-with: Cursor
Made-with: Cursor
Made-with: Cursor
There was a problem hiding this comment.
Pull request overview
This PR overhauls the repository’s CI by centralizing configuration (containers, compiler→make-target mapping, MPI runtime flags) and refactoring workflows into reusable building blocks, while updating subset testing to use PyCECT (ECT) for CPU compiler/MPI subsets and introducing per-compiler+MPI status badges.
Changes:
- Added
.github/ci-config.envplusresolve-containerto make container/image + compiler/MPI mapping a single source of truth. - Replaced monolithic subset workflows with reusable
_test-compiler.yml/_test-gpu.ymland small caller workflows per compiler+MPI. - Updated multiple workflows/actions to use the centralized config (incl. MPI runtime flags) and removed third-party artifact deletion action in favor of
gh api.
Reviewed changes
Copilot reviewed 33 out of 35 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| README.md | Updates CI badge table to per-compiler+MPI badges. |
| .gitignore | Broadens tarball ignore rule; adds fortitude output ignore. |
| .github/workflows/test-nvhpc.yml | Removes old NVHPC subset workflow. |
| .github/workflows/test-nvhpc-openmpi.yml | New NVHPC+OpenMPI subset caller workflow. |
| .github/workflows/test-nvhpc-mpich.yml | New NVHPC+MPICH subset caller workflow. |
| .github/workflows/test-intel.yml | Removes old Intel subset workflow. |
| .github/workflows/test-intel-openmpi.yml | New Intel+OpenMPI subset caller workflow. |
| .github/workflows/test-intel-mpich.yml | New Intel+MPICH subset caller workflow. |
| .github/workflows/test-gpu.yml | Removes old GPU subset workflow. |
| .github/workflows/test-gpu-openmpi.yml | New GPU OpenMPI caller workflow (workflow_dispatch). |
| .github/workflows/test-gpu-mpich.yml | New GPU MPICH caller workflow (workflow_dispatch). |
| .github/workflows/test-gcc.yml | Removes old GCC subset workflow. |
| .github/workflows/test-gcc-openmpi.yml | New GNU+OpenMPI subset caller workflow. |
| .github/workflows/test-gcc-mpich.yml | New GNU+MPICH subset caller workflow. |
| .github/workflows/test-ga-nogpu.yml | Uses resolved container config; removes mpich3; replaces artifact cleanup with gh api. |
| .github/workflows/test-cirrus-nvhpc.yml | Uses resolved container config; removes mpich3; replaces artifact cleanup with gh api. |
| .github/workflows/fortran-linting.yml | Removes Fortran linting workflow. |
| .github/workflows/ect-test.yml | Uses resolved container config; switches compiler input to gcc; replaces artifact cleanup with gh api. |
| .github/workflows/ect-ensemble-gen.yml | Uses resolved container config; removes mpich3 option; fixes resolution env var name; replaces artifact cleanup with gh api. |
| .github/workflows/coverage.yml | Uses resolved container config and removes mpich3 reference. |
| .github/workflows/_test-gpu.yml | Adds reusable GPU-vs-CPU workflow used by GPU subset callers. |
| .github/workflows/_test-compiler.yml | Adds reusable ECT-based compiler+MPI subset workflow for CPU. |
| .github/test-cases/ect-120km/config.env | Renames RESOLUTION → DATA_RESOLUTION; updates summary prefix var usage. |
| .github/test-cases/240km/config.env | Renames RESOLUTION → DATA_RESOLUTION. |
| .github/test-cases/120km/config.env | Renames RESOLUTION → DATA_RESOLUTION. |
| .github/copilot-instructions.md | Adds MPAS-specific Fortran coding standards for AI assistants. |
| .github/ci-config.env | Adds centralized CI configuration for containers, make targets, MPI flags, and workarounds. |
| .github/actions/validate-logs/action.yml | Simplifies log validation action (removes extra listing/arg-building steps). |
| .github/actions/run-perturb-mpas/action.yml | Adds verbose toggle; sources ci-config.env for MPI flags/env vars. |
| .github/actions/run-mpas/action.yml | Sources ci-config.env for MPI flags/env vars; removes old RESOLUTION clobber workaround. |
| .github/actions/resolve-container/action.yml | Adds composite action to resolve container images + mapping JSON from ci-config.env. |
| .github/actions/download-testdata/action.yml | Uses DATA_RESOLUTION to select archive name. |
| .github/actions/build-mpas/action.yml | Reads ci-config.env for make-target mapping and compiler/MPI workarounds; applies oneAPI fpp workaround. |
| .github/AGENT_GUIDE.md | Updates documentation to reflect DATA_RESOLUTION naming and config/env behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Made-with: Cursor
Made-with: Cursor
Made-with: Cursor
…ge output Made-with: Cursor
Made-with: Cursor
Made-with: Cursor
Made-with: Cursor
Made-with: Cursor
- actions/checkout v4 → v5 - actions/upload-artifact v4 → v6 - actions/download-artifact v4 → v6 - actions/setup-python v5 → v6 - actions/cache v4 → v5 - codecov/codecov-action v4 → v6 Eliminates Node.js 20 deprecation warnings that will become errors after June 2, 2026. Made-with: Cursor
- Bump download-artifact v6 → v7 (v6 still uses Node.js 20 runtime) - Source /container/config_env.sh before installing Python deps so miniforge Python/conda is on PATH in older containers (e.g. Leap 25.09) Made-with: Cursor
Each composite action step runs in its own shell, so conda/pip packages installed in the "Install Python dependencies" step were invisible to the "Run perturbed members" step. Merging them into a single step ensures the environment (PATH, conda env) persists. Made-with: Cursor
Check if netCDF4 and numpy are already importable (many containers ship them via miniforge) before attempting pip or conda install. Most resilient to container environment changes. Made-with: Cursor
The root cause: conda install put packages into miniforge's Python but python3 still resolved to the system Python (which has no netCDF4). Activating the conda base environment early ensures python3 points to the same Python where packages are installed. Made-with: Cursor
MPICH is the default on Derecho, so only MPICH subsets run automatically on push/PR. OpenMPI subsets remain available via manual workflow_dispatch. Made-with: Cursor
Badge table now shows the container image tag and MPI rank count for each test. Container tags encode compiler + library versions, so updating ci-config.env is the only change needed when images are bumped. Added note about Intel 25.09 pin and links to ci-config.env and Docker Hub. Made-with: Cursor
- Remove CONTAINER_COMPILER entries for nvhpc and oneapi (they mapped to themselves and served no purpose) - Add Docker Hub link so maintainers can check image tag names - Add concrete example explaining why gcc=gcc14 mapping is needed - Trim casual language and redundant comments throughout Made-with: Cursor
- Update branch structure (feature-ci-cleanup is active, old branches removed) - Replace deleted workflow descriptions with new subset/reusable architecture - Update container section from cisldev to hpcdev - Remove outdated MPI compatibility matrix and development history - Add ci-config.env, resolve-container, and checkout-mpas-source docs - Trim verbose sections while keeping actionable reference material Made-with: Cursor
Made-with: Cursor
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 35 out of 37 changed files in this pull request and generated 8 comments.
Comments suppressed due to low confidence (2)
README.md:1
- The README’s container strings (e.g.,
hpcdev:almalinux9-gcc14-mpich-26.02) don’t match the actual fully-qualified images used by workflows (docker.io/ncarcisl/hpcdev-x86_64:...as assembled from.github/ci-config.env). This can confuse users trying to reproduce CI locally. Consider updating the table to either (a) show the exact resolved image names (includingdocker.io/ncarcisl/hpcdev-x86_64:), or (b) explicitly call out that the table shows only the tag portion and that the full image is derived fromci-config.env.
README.md:1 - The README’s container strings (e.g.,
hpcdev:almalinux9-gcc14-mpich-26.02) don’t match the actual fully-qualified images used by workflows (docker.io/ncarcisl/hpcdev-x86_64:...as assembled from.github/ci-config.env). This can confuse users trying to reproduce CI locally. Consider updating the table to either (a) show the exact resolved image names (includingdocker.io/ncarcisl/hpcdev-x86_64:), or (b) explicitly call out that the table shows only the tag portion and that the full image is derived fromci-config.env.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Append || true to artifact cleanup pipelines so empty results don't fail the step under bash pipefail (4 workflow files) - Add default fallback for OPENMPI_RUN_FLAGS in run-mpas and run-perturb-mpas in case ci-config.env isn't sourced Made-with: Cursor
Summary
Comprehensive CI cleanup: consolidates duplicate workflows into reusable templates, switches validation from bit-for-bit log comparison to Ensemble Consistency Testing (ECT), and resolves Node.js 20 deprecation warnings.
Architecture changes
_test-compiler.ymlor_test-gpu.yml.Notable fixes
geekyeggo/delete-artifactwithgh apicalls.run-perturb-mpasactivates conda base env, then uses import check → pip → conda fallback chain.cppif missing (needed by older Leap containers).Deleted workflows
test-ga-nogpu.yml,test-cirrus-nvhpc.yml,test-gcc.yml,test-nvhpc.yml,test-intel.yml,test-gpu.yml,test-hpcdev-containers.yml,debug-hpcdev-mpi.yml,fortran-linting.ymlDocumentation
copilot-instructions.mdadded with MPAS-specific Fortran coding standards.Test results