Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
81 changes: 81 additions & 0 deletions .github/actions/start-local-cre-environment/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
name: Start local CRE environment
description: >
Runs `go run . env start` from the CRE environment CLI with retries; runs `env
stop` between attempts on failure.

inputs:
jd-image:
description: Sets `CTF_JD_IMAGE` for the local CRE environment (job-distributor
image ref).
required: true
chainlink-image:
description: Sets `CTF_CHAINLINK_IMAGE` for the local CRE environment (Chainlink
node image ref).
required: true
ctf-configs:
description: Sets `CTF_CONFIGS` for the local CRE environment (environment TOML
configuration(s) to use).
required: true
chip-router-image:
description: Sets `CTF_CHIP_ROUTER_IMAGE` for the local CRE environment (local
CRE Chip Router image ref).
required: true
retry-count:
description: Maximum attempts to start the environment
default: "3"
retry-delay-seconds:
description: Seconds to sleep after cleanup before the next attempt
default: "15"
cleanup-on-error:
description: Value for `env start --cleanup-on-error` (typically false in CI to
preserve logs on final failure)
default: "false"
env-start-extra-args:
description: >
Optional extra arguments for `env start` (e.g. `--with-contracts-version
v1`). Leave empty for default (v2) behavior.
default: ""
working-directory:
description: Directory containing the CRE environment CLI (relative to the workspace)
default: core/scripts/cre/environment

runs:
using: composite
steps:
- name: Start local CRE with retries
shell: bash
working-directory: ${{ inputs.working-directory }}
env:
MAX_ATTEMPTS: ${{ inputs.retry-count }}
RETRY_DELAY_SECONDS: ${{ inputs.retry-delay-seconds }}
CLEANUP_ON_ERROR: ${{ inputs.cleanup-on-error }}
ENV_START_EXTRA_ARGS: ${{ inputs.env-start-extra-args }}
CTF_JD_IMAGE: ${{ inputs.jd-image }}
CTF_CHAINLINK_IMAGE: ${{ inputs.chainlink-image }}
CTF_CONFIGS: ${{ inputs.ctf-configs }}
CTF_CHIP_ROUTER_IMAGE: ${{ inputs.chip-router-image }}
run: |
set -u
set +e
# GitHub invokes bash with errexit (-e); disable it so a failed `env start`
# does not abort the script before we can retry.
last_exit=1
attempt=1
while [[ "$attempt" -le "$MAX_ATTEMPTS" ]]; do
echo "Starting local CRE (attempt ${attempt}/${MAX_ATTEMPTS})..."
# shellcheck disable=SC2086
go run . env start ${ENV_START_EXTRA_ARGS} --cleanup-on-error="${CLEANUP_ON_ERROR}"
last_exit=$?
if [[ "$last_exit" -eq 0 ]]; then
exit 0
fi
echo "env start failed with exit code ${last_exit}"
if [[ "$attempt" -lt "$MAX_ATTEMPTS" ]]; then
echo "Running env stop before retry..."
go run . env stop || true
sleep "${RETRY_DELAY_SECONDS}"
fi
attempt=$((attempt + 1))
done
echo "Error: failed to start local CRE after ${MAX_ATTEMPTS} attempts (last exit code ${last_exit})"
exit "${last_exit}"
38 changes: 13 additions & 25 deletions .github/workflows/cre-regression-system-tests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -186,7 +186,7 @@
- name: Set up gotestsum
shell: bash
run: |
echo "::startgroup::Install gotestsum"
echo "::group::Install gotestsum"
go install gotest.tools/gotestsum@v1.12.3
echo "::endgroup::"

Expand All @@ -206,32 +206,20 @@
echo "resolved_image=${resolved_image}" >> "${GITHUB_OUTPUT}"

- name: Start local CRE${{ matrix.tests.cre_version }}
shell: bash
id: start-local-cre
env:
CTF_JD_IMAGE: "${{ secrets.AWS_ACCOUNT_ID_PROD }}.dkr.ecr.${{
uses: ./.github/actions/start-local-cre-environment
with:
jd-image: "${{ secrets.AWS_ACCOUNT_ID_PROD }}.dkr.ecr.${{
secrets.QA_AWS_REGION }}.amazonaws.com/job-distributor:0.22.1"
CTF_CHAINLINK_IMAGE: "${{ steps.resolve-chainlink-image.outputs.resolved_image }}"
CTF_CONFIGS: configs/workflow-gateway-capabilities-don.toml
CRE_VERSION: ${{ matrix.tests.cre_version }}
TEST_NAME: ${{ matrix.tests.test_name }}
run: |
cd core/scripts/cre/environment

# Start CRE with the appropriate contracts version (i.e. Workflow/CapabilityRegistry)
if [[ "${CRE_VERSION}" == "v1" ]]; then
echo "Starting CRE with explicit v1 contracts for test: '${TEST_NAME}' and configs: '${CTF_CONFIGS}'"
go run . env start --with-contracts-version v1
else
echo "Starting CRE with default v2 contracts for test: '${TEST_NAME}' and configs: '${CTF_CONFIGS}'"
go run . env start
fi

exit_code=$?
if [ $exit_code -ne 0 ]; then
echo "Error: failed to start local CRE ${CRE_VERSION}, exit code $exit_code"
exit $exit_code
fi
chainlink-image: "${{ steps.resolve-chainlink-image.outputs.resolved_image }}"
chip-router-image: "${{ secrets.QA_AWS_ACCOUNT_NUMBER }}.dkr.ecr.${{
secrets.QA_AWS_REGION }}.amazonaws.com/local-cre-chip-router:v1.0.1"
ctf-configs: configs/workflow-gateway-capabilities-don.toml
retry-count: "3"
retry-delay-seconds: "15"
cleanup-on-error: "false"
env-start-extra-args: "${{ matrix.tests.cre_version == 'v1' && '--with-contracts-version v1' || '' }}"
working-directory: core/scripts/cre/environment

- name: Run CRE${{ matrix.tests.cre_version }} Regression system tests
id: run-regression-tests
Expand All @@ -243,7 +231,7 @@
TEST_TIMEOUT: 7m # let's leave 3 minutes for other steps (the whole job times out after 10 minutes)
PARALLEL_COUNT: "10"
CRE_TEST_PARALLEL_ENABLED: "true"
run: |

Check warning on line 234 in .github/workflows/cre-regression-system-tests.yaml

View workflow job for this annotation

GitHub Actions / Validate Github Action Workflows

[actionlint] reported by reviewdog 🐶 shellcheck reported issue in this script: SC2086:info:14:41: Double quote to prevent globbing and word splitting [shellcheck] Raw Output: i:.github/workflows/cre-regression-system-tests.yaml:234:9: shellcheck reported issue in this script: SC2086:info:14:41: Double quote to prevent globbing and word splitting [shellcheck]
echo "Starting test: '${TEST_NAME}'"
echo "⚠️⚠️⚠️ Add 'skip-e2e-regression' label to skip this step if necessary ⚠️⚠️⚠️"

Expand Down
71 changes: 44 additions & 27 deletions .github/workflows/cre-soak-memory-leak.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,24 +34,6 @@
permissions:
contents: read
id-token: write
env:
CTF_CONFIGS: configs/workflow-gateway-capabilities-don.toml
CRE_SOAK_DURATION: "2h"
CTF_JD_IMAGE: "${{ secrets.AWS_ACCOUNT_ID_PROD }}.dkr.ecr.${{
secrets.QA_AWS_REGION }}.amazonaws.com/job-distributor:0.22.1"
CTF_CHAINLINK_IMAGE: "${{ secrets.QA_AWS_ACCOUNT_NUMBER }}.dkr.ecr.${{
secrets.QA_AWS_REGION }}.amazonaws.com/${{ inputs.ecr_name ||
'chainlink' }}:${{ inputs.chainlink_image_tag }}"
CTF_CHIP_INGRESS_IMAGE: "${{ secrets.AWS_ACCOUNT_ID_PROD }}.dkr.ecr.${{
secrets.QA_AWS_REGION
}}.amazonaws.com/atlas-chip-ingress:da84cb72d3a160e02896247d46ab4b9806e\
bee2f"
CTF_CHIP_CONFIG_IMAGE: "${{ secrets.AWS_ACCOUNT_ID_PROD }}.dkr.ecr.${{
secrets.QA_AWS_REGION
}}.amazonaws.com/atlas-chip-config:7b4e9ee68fd1c737dd3480b5a3ced0188f29\
b969"
CTF_CHIP_ROUTER_IMAGE: "${{ secrets.QA_AWS_ACCOUNT_NUMBER }}.dkr.ecr.${{
secrets.QA_AWS_REGION }}.amazonaws.com/local-cre-chip-router:v1.0.1"

steps:
- name: Enable S3 Cache for Self-Hosted Runners
Expand Down Expand Up @@ -92,18 +74,43 @@
- name: Start observability stack
shell: bash
working-directory: core/scripts/cre/environment
env:
OBS_MAX_ATTEMPTS: "3"
OBS_RETRY_DELAY_SECONDS: "15"
run: |
echo "::startgroup::Starting observability stack (required by leak package)"
go run . obs up -f
echo "::endgroup::"
set -u
attempt=1
while [[ "$attempt" -le "$OBS_MAX_ATTEMPTS" ]]; do
echo "::group::Starting observability stack (attempt ${attempt}/${OBS_MAX_ATTEMPTS}, required by leak package)"
if go run . obs up -f; then
echo "::endgroup::"
exit 0
fi
echo "::endgroup::"
if [[ "$attempt" -lt "$OBS_MAX_ATTEMPTS" ]]; then
go run . obs down || true
sleep "$OBS_RETRY_DELAY_SECONDS"
fi
attempt=$((attempt + 1))
done
exit 1

- name: Start local CRE
shell: bash
working-directory: core/scripts/cre/environment
run: |
echo "::startgroup::Starting local CRE"
go run . env start --cleanup-on-error=false
echo "::endgroup::"
uses: ./.github/actions/start-local-cre-environment
with:
jd-image: "${{ secrets.AWS_ACCOUNT_ID_PROD }}.dkr.ecr.${{ secrets.QA_AWS_REGION
}}.amazonaws.com/job-distributor:0.22.1"
chainlink-image: "${{ secrets.QA_AWS_ACCOUNT_NUMBER }}.dkr.ecr.${{
secrets.QA_AWS_REGION }}.amazonaws.com/${{ inputs.ecr_name ||
'chainlink' }}:${{ inputs.chainlink_image_tag }}"
ctf-configs: configs/workflow-gateway-capabilities-don.toml
chip-router-image: "${{ secrets.QA_AWS_ACCOUNT_NUMBER
}}.dkr.ecr.${{secrets.QA_AWS_REGION
}}.amazonaws.com/local-cre-chip-router:v1.0.1"
retry-count: "3"
retry-delay-seconds: "15"
cleanup-on-error: "false"
working-directory: core/scripts/cre/environment

- name: Install gotestsum
shell: bash
Expand All @@ -113,6 +120,16 @@
id: run-soak
shell: bash
working-directory: system-tests/tests
env:
CRE_SOAK_DURATION: "2h"
CTF_CHIP_INGRESS_IMAGE: "${{ secrets.AWS_ACCOUNT_ID_PROD }}.dkr.ecr.${{
secrets.QA_AWS_REGION
}}.amazonaws.com/atlas-chip-ingress:da84cb72d3a160e02896247d46ab4b9\
806ebee2f"
CTF_CHIP_CONFIG_IMAGE: "${{ secrets.AWS_ACCOUNT_ID_PROD }}.dkr.ecr.${{
secrets.QA_AWS_REGION
}}.amazonaws.com/atlas-chip-config:7b4e9ee68fd1c737dd3480b5a3ced018\
8f29b969"
run: |
gotestsum \
--jsonfile=/tmp/gotest.log \
Expand Down Expand Up @@ -143,7 +160,7 @@
notify-test-failure:
name: Notify about test Failure
#if: failure()
if: false # TODO: Silence for now

Check failure on line 163 in .github/workflows/cre-soak-memory-leak.yml

View workflow job for this annotation

GitHub Actions / Validate Github Action Workflows

[actionlint] reported by reviewdog 🐶 constant expression "false" in condition. remove the if: section [if-cond] Raw Output: e:.github/workflows/cre-soak-memory-leak.yml:163:9: constant expression "false" in condition. remove the if: section [if-cond]
needs: [ soak ]
environment:
name: integration
Expand Down
Loading
Loading