Skip to content

[DX-4082] retry local CRE startup in CI#22342

Open
Tofel wants to merge 2 commits intodevelopfrom
dx-4082-retry-local-cre-startup
Open

[DX-4082] retry local CRE startup in CI#22342
Tofel wants to merge 2 commits intodevelopfrom
dx-4082-retry-local-cre-startup

Conversation

@Tofel
Copy link
Copy Markdown
Contributor

@Tofel Tofel commented May 7, 2026

it should help us to avoid issues like this:

Error: failed to setup test environment: failed to start DONs: failed to start DONs: failed to start nodeSet named bootstrap-gateway: create container: Error response from daemon: unauthorized: authentication required
Stack trace: goroutine 1 [running]:

Verification:

Error: failed to start environment: failed to setup test environment: failed to start chip router: create container: Error response from daemon: pull access denied for local-cre-chip-router, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
exit status 1
env start failed with exit code 1

Running env stop before retry...

4:00PM INF Cleaning up docker containers label=framework=ctf
4:00PM WRN failed to remove local CRE state file: remove /home/runner/_work/chainlink/chainlink/core/scripts/cre/environment/state/local_cre.toml: no such file or directory

Local CRE environment stopped successfully
Starting local CRE (attempt 2/3)...
4:01PM INF Removing environment state directory: core/scripts/cre/environment/state

	db       .d88b.   .o88b.  .d8b.  db            .o88b. d8888b. d88888b
	88      .8P  Y8. d8P  Y8 d8' `8b 88           d8P  Y8 88  `8D 88'
	88      88    88 8P      88ooo88 88           8P      88oobY' 88ooooo
	88      88    88 8b      88~~~88 88           8b      88`8b   88~~~~~
	88booo. `8b  d8' Y8b  d8 88   88 88booo.      Y8b  d8 88 `88. 88.
4:01PM INF Loading configuration input Path=configs/capability_defaults.toml

Source: https://github.com/smartcontractkit/chainlink/actions/runs/25506786634/job/74855632140

@Tofel Tofel changed the title retry local CRE startup in CI [DX-4082] retry local CRE startup in CI May 7, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

✅ No conflicts with other open PRs targeting develop

@Tofel Tofel force-pushed the dx-4082-retry-local-cre-startup branch from 3279fe7 to 0f5fd2d Compare May 7, 2026 15:52
@trunk-io
Copy link
Copy Markdown

trunk-io Bot commented May 7, 2026

Static BadgeStatic BadgeStatic BadgeStatic Badge

View Full Report ↗︎Docs

@Tofel Tofel marked this pull request as ready for review May 7, 2026 16:48
@Tofel Tofel requested review from a team as code owners May 7, 2026 16:48
Copilot AI review requested due to automatic review settings May 7, 2026 16:48
@Tofel Tofel requested a review from a team as a code owner May 7, 2026 16:48
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Risk Rating: LOW

Improves CI robustness for Local CRE-based workflows by adding retry logic around environment startup (and observability stack startup where required), mitigating transient container pull/auth failures (e.g., ECR “unauthorized”).

Changes:

  • Introduces a reusable composite action to start the Local CRE environment with retries and cleanup between attempts.
  • Updates CRE system/regression workflows to use the composite action and configure retry parameters.
  • Adds retry logic for starting the observability stack in workflows that depend on it.

Scrupulous human review recommended (targeted areas):

  • .github/actions/start-local-cre-environment/action.yml: confirm env stop is the correct teardown between retries for partially-started environments and won’t mask useful intermediate failure signals.
  • .github/workflows/cre-system-tests.yaml: confirm the observability retry/validation logic matches expected service names/ports and failure modes.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
.github/workflows/cre-system-tests.yaml Adds retry logic for observability startup (Beholder suite) and switches Local CRE startup to the new composite action with retries.
.github/workflows/cre-soak-memory-leak.yml Adds retry logic for observability startup and switches Local CRE startup to the new composite action.
.github/workflows/cre-regression-system-tests.yaml Switches Local CRE startup to the new composite action with retries.
.github/actions/start-local-cre-environment/action.yml New composite action implementing retried go run . env start with env stop between failed attempts.

set -u
attempt=1
while [[ "$attempt" -le "$OBS_MAX_ATTEMPTS" ]]; do
echo "::startgroup::Starting observability stack (attempt ${attempt}/${OBS_MAX_ATTEMPTS}, required by leak package)"
while [[ "$attempt" -le "$OBS_MAX_ATTEMPTS" ]]; do
echo "::group::Starting Observability Stack (attempt ${attempt}/${OBS_MAX_ATTEMPTS})"
echo "Test requires observability stack for '${TEST_NAME}', starting..."
go run . obs up -f
@cl-sonarqube-production
Copy link
Copy Markdown

Quality Gate passed Quality Gate passed

Issues
0 New issues
0 Fixed issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarQube

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants