feat(ci): parallelize deployments and readiness checks for showcase and showcase-rbac by gustavolira · Pull Request #4469 · redhat-developer/rhdh

gustavolira · 2026-03-26T13:24:14Z

Summary

Run base and RBAC Helm deployments concurrently via background processes
Overlap Backstage readiness waits (~5 min each) by running health checks in parallel, saving ~3-5 min
Tests run sequentially (parallel Playwright caused OOMKilled — not enough memory for 6 concurrent browsers)
Per-project output paths for test artifacts to avoid file conflicts between runs

Design

Phase	Execution	Why
Helm deployments	Parallel	Independent namespaces, no resource conflict
Readiness checks (HTTP polling)	Parallel	Lightweight, main time savings (~5 min)
Playwright tests	Sequential	Parallel caused OOMKilled (exit 137) — 2 Playwright x 3 workers exceeds CI memory

Changes

Parallel deployments (utils.sh):

initiate_deployments() and initiate_deployments_osd_gcp() run base + RBAC concurrently with & + wait

Parallel readiness + sequential tests (testing.sh):

New testing::parallel_check_and_test() pre-warms readiness checks in parallel, then calls testing::check_and_test sequentially for each deployment
testing::run_tests() uses per-project output paths: test-results-<subdir>/, junit-results-<subdir>.xml, playwright-report-<subdir>/, per-project log files
testing::check_backstage_running() uses per-project log file path to avoid race conditions

Pod log collection (utils.sh):

save_all_pod_logs() passes pod logs directory as function parameter (not exported global)
Uses unique /tmp/pod_logs-<subdir>/ per artifacts subdir

Callers (ocp-pull.sh, ocp-nightly.sh):

Both use testing::parallel_check_and_test()

Test plan

Verify OCP PR job runs deployments and readiness checks in parallel
Verify Playwright tests run sequentially without OOMKilled
Verify nightly subsequent test phases (runtime, sanity, localization) work correctly
Verify OSD-GCP nightly deploys in parallel
Verify artifacts match original layout (index.html, data/, trace/ at showcase/ root)
Verify proper error reporting when one deployment fails

Resolves: RHIDP-12296

🤖 Generated with Claude Code

github-actions · 2026-03-26T13:55:35Z

The container image build workflow finished with status: cancelled.

gustavolira · 2026-03-26T13:58:13Z

/review

rhdh-qodo-merge · 2026-03-26T13:58:51Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🎫 Ticket compliance analysis 🔶 RHIDP-12296 - Partially compliant Compliant requirements: Run showcase and showcase-rbac deployments concurrently (background + wait) Pod log collection covers both namespaces (parallel-safe temp dirs per subdir) Non-compliant requirements: Proper error handling: if one deployment/test fails, both failures are reported and the overall job reflects failure Requires further human verification: Each Playwright test suite starts when its own deployment is ready Reduce total test-phase time by ~3–5 minutes
⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🔒 No security concerns identified
⚡ Recommended focus areas for review Return Status `_run_parallel` returns `1` on failure, but on success it does not explicitly `return 0`. Relying on implicit function exit status can be fragile and may cause unexpected non-zero returns depending on the last executed command; explicitly returning `0` at the end would make success/failure behavior unambiguous. _run_parallel() { local label=$1 local base_func=$2 local base_arg=$3 local rbac_func=$4 local rbac_arg=$5 log::section "Starting parallel ${label} (base + RBAC)" "${base_func}" "${base_arg}" & local base_pid=$! log::info "Base ${label} started in background (PID: ${base_pid})" "${rbac_func}" "${rbac_arg}" & local rbac_pid=$! log::info "RBAC ${label} started in background (PID: ${rbac_pid})" local base_result=0 local rbac_result=0 wait "${base_pid}" \|\| base_result=$? wait "${rbac_pid}" \|\| rbac_result=$? if [[ ${base_result} -ne 0 ]]; then log::error "Base ${label} failed (exit code: ${base_result})" else log::success "Base ${label} completed" fi if [[ ${rbac_result} -ne 0 ]]; then log::error "RBAC ${label} failed (exit code: ${rbac_result})" else log::success "RBAC ${label} completed" fi if [[ ${base_result} -ne 0 \|\| ${rbac_result} -ne 0 ]]; then return 1 fi }
📚 Focus areas based on broader codebase context Exit Codes `testing::parallel_check_and_test()` currently does `wait ... \|\| true` for both background test processes and always returns success, even if one (or both) `testing::check_and_test` runs fail. This can cause CI to report a green job while tests actually failed; capture each wait’s exit code and `return 1`/non-zero when either fails. (Ref 2) testing::parallel_check_and_test() { local base_release=$1 local base_namespace=$2 local base_project=$3 local base_url=$4 local rbac_release=$5 local rbac_namespace=$6 local rbac_project=$7 local rbac_url=$8 Xvfb :99 & local xvfb_pid=$! export DISPLAY=:99 log::section "Starting parallel test execution (base + RBAC)" testing::check_and_test "${base_release}" "${base_namespace}" "${base_project}" "${base_url}" & local base_pid=$! log::info "Base tests started in background (PID: ${base_pid})" testing::check_and_test "${rbac_release}" "${rbac_namespace}" "${rbac_project}" "${rbac_url}" & local rbac_pid=$! log::info "RBAC tests started in background (PID: ${rbac_pid})" wait "${base_pid}" \|\| true wait "${rbac_pid}" \|\| true kill "${xvfb_pid}" 2> /dev/null \|\| true unset DISPLAY } Reference reasoning: The existing flow explicitly preserves and propagates failure via an exit code check and `exit $DEPLOYMENT_EXIT_CODE` when deployment/test steps fail. Align the new parallel helper with that pattern by aggregating child exit codes and returning a failing status when any parallel branch fails.
📄 References redhat-developer/rhdh/e2e-tests/container-init.sh [136-175] redhat-developer/rhdh/e2e-tests/container-init.sh [177-182] redhat-developer/rhdh/e2e-tests/container-init.sh [47-84] redhat-developer/rhdh/e2e-tests/container-init.sh [119-134] redhat-developer/rhdh/e2e-tests/container-init.sh [86-117] redhat-developer/rhdh/e2e-tests/local-run.sh [427-441] redhat-developer/rhdh/e2e-tests/local-run.sh [442-470] redhat-developer/rhdh/e2e-tests/local-run.sh [403-426]

github-actions · 2026-03-26T13:59:41Z

The container image build workflow finished with status: cancelled.

github-actions · 2026-03-26T14:32:24Z

The container image build workflow finished with status: cancelled.

Run base and RBAC deployments concurrently using background processes with proper error reporting for both. Parallelize Playwright test execution with a shared Xvfb display and per-project output directories to avoid race conditions on test-results, junit XML, and HTML reports. Reduces PR feedback and nightly test-phase time by ~3-5 min by overlapping both Backstage readiness waits and test suites. RHIDP-12296 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The --previous log path was still hardcoded to pod_logs/ instead of using the parallel-safe _POD_LOGS_DIR variable, causing previous logs to silently fail when save_all_pod_logs uses a /tmp/ directory. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fixes SonarCloud S7682 — functions should end with explicit return. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-03-26T14:37:33Z

The container image build workflow finished with status: cancelled.

github-actions · 2026-03-26T15:16:52Z

Image was built and published successfully. It is available at:

Parallel Playwright execution caused OOMKilled in CI (exit code 137). Refactor testing::parallel_check_and_test to: - Phase 1: run readiness checks in parallel (lightweight HTTP polling) - Phase 2: run test suites sequentially (avoids concurrent Playwright OOM) This still saves ~5 min from overlapping readiness waits while staying within the CI container memory limits. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-03-26T18:37:18Z

Image was built and published successfully. It is available at:

The playwright-report files (index.html, data/, trace/) were being saved inside a playwright-report/ subdirectory instead of at the root of the artifact dir. Remove the third arg from save_artifact so the per-project report directory contents are copied directly into the artifact root, matching the original layout. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-03-26T21:30:23Z

Image was built and published successfully. It is available at:

subhashkhileri

Reviewed the parallelization changes. The overall design is sound — parallel readiness checks are a good win, and per-project artifact paths fix a real overwrite bug. A few suggestions below.

subhashkhileri · 2026-03-27T16:40:58Z

.ci/pipelines/lib/testing.sh

+  # Phase 1: Wait for both deployments to be ready in parallel (lightweight HTTP polling)
+  log::section "Waiting for both deployments to be ready in parallel"
+
+  testing::check_backstage_running "${base_release}" "${base_namespace}" "${base_url}" "${base_artifacts}" &


Bug: $LOGFILE race condition when running in background

Both background readiness checks will use the shared global $LOGFILE (test-log). Inside check_backstage_running() (lines 279, 289), on crash-detection or timeout it writes:

common::save_artifact "${artifacts_subdir}" "/tmp/${LOGFILE}" || true

Since both background processes share the same $LOGFILE, they'll clobber each other's log if both hit the failure/timeout path.

run_tests() was correctly updated to use per-project logfiles — but check_backstage_running() was missed.

Suggest updating those lines to use a per-project path:

common::save_artifact "${artifacts_subdir}" "/tmp/${LOGFILE}-${artifacts_subdir}" || true

subhashkhileri · 2026-03-27T16:40:58Z

.ci/pipelines/utils.sh

+_run_parallel() {
+  local label=$1
+  local base_func=$2
+  local base_arg=$3


Suggestion: Inline instead of abstracting

This helper is called exactly twice (initiate_deployments and initiate_deployments_osd_gcp) and only supports single-argument functions. The generic naming/signature suggests reusability, but it's tightly constrained. Consider inlining the &/wait pattern directly in the two callers — it's ~6 lines and immediately readable:

initiate_deployments() { local base_artifacts_subdir=$1 local rbac_artifacts_subdir=$2 cd "${DIR}" base_deployment "${base_artifacts_subdir}" & local base_pid=$! rbac_deployment "${rbac_artifacts_subdir}" & local rbac_pid=$! local failed=0 wait "${base_pid}" || { log::error "Base deployment failed"; failed=1; } wait "${rbac_pid}" || { log::error "RBAC deployment failed"; failed=1; } return "${failed}" }

No indirection, no need to look up what _run_parallel does.

subhashkhileri · 2026-03-27T16:40:58Z

.ci/pipelines/lib/testing.sh

+# Args:
+#   $1-$5: release_name, namespace, playwright_project, url, artifacts_subdir
+#   $6: readiness check exit code (0 = ready)
+_run_check_and_test_phase() {


Duplication with testing::check_and_test()

This function duplicates the logic already in testing::check_and_test() (lines 307-341) — same pod display, same SKIP_TESTS check, same run_tests/save_all_pod_logs branching, same mark_deploy_failed on failure.

Consider reusing check_and_test instead. After the parallel readiness wait completes, check_backstage_running inside check_and_test will return immediately on the first curl (HTTP 200 in ~2 seconds since the deployment is already up). The re-check cost is negligible, and you avoid maintaining two copies of the same logic:

# In ocp-pull.sh — parallel readiness, then reuse existing check_and_test testing::check_backstage_running "..." "..." "${url}" "..." & local base_pid=$! testing::check_backstage_running "..." "..." "${rbac_url}" "..." & local rbac_pid=$! wait "${base_pid}"; wait "${rbac_pid}" testing::check_and_test "${RELEASE_NAME}" "${NAME_SPACE}" "${PW_PROJECT_SHOWCASE}" "${url}" testing::check_and_test "${RELEASE_NAME_RBAC}" "${NAME_SPACE_RBAC}" "${PW_PROJECT_SHOWCASE_RBAC}" "${rbac_url}"

No new functions needed.

subhashkhileri · 2026-03-27T16:40:58Z

.ci/pipelines/utils.sh

-  rm -rf pod_logs && mkdir -p pod_logs
+
+  # Use a unique temp directory per artifacts_subdir to allow parallel pod log collection
+  export _POD_LOGS_DIR="/tmp/pod_logs-${artifacts_subdir}"


Code smell: exported global to pass a directory path

Using export _POD_LOGS_DIR + unset to communicate with retrieve_pod_logs is fragile. If save_all_pod_logs is ever called concurrently (which this PR is moving toward), the global gets clobbered between calls.

Consider passing it as a function parameter instead:

retrieve_pod_logs "$pod_name" "$container" "$namespace" "$pod_logs_dir"

This eliminates the global state, the unset cleanup, and the ${_POD_LOGS_DIR:-pod_logs} fallback pattern in retrieve_pod_logs.

subhashkhileri · 2026-03-27T16:40:58Z

.ci/pipelines/lib/testing.sh

+# Args: pairs of (release_name namespace playwright_project url) for base and RBAC
+#   $1-$4: base args
+#   $5-$8: RBAC args
+testing::parallel_check_and_test() {


Nit: Hardcoded for exactly 2 deployments

This function takes 8 positional args (4 per deployment) and is hardcoded to the base+RBAC pair. This is fine for now, but a brief comment like # Hardcoded for the base + RBAC deployment pair would clarify that the rigid 8-arg signature is intentional and not meant to be extended to N deployments.

…ation - Inline _run_parallel() into both callers (review #2) - Remove _run_check_and_test_phase, reuse check_and_test directly after parallel readiness pre-warm — re-check is instant if already up (review redhat-developer#3) - Pass pod_logs_dir as function parameter instead of exported global (review redhat-developer#4) - Fix $LOGFILE race in check_backstage_running for parallel execution (review #1) - Add comment clarifying hardcoded base+RBAC pair (review redhat-developer#5) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

sonarqubecloud · 2026-03-27T18:26:49Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

openshift-ci · 2026-03-27T18:58:33Z

@gustavolira: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-ocp-helm	`91a69f1`	link	true	`/test e2e-ocp-helm`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

github-actions · 2026-03-27T19:00:48Z

Image was built and published successfully. It is available at:

openshift-ci bot requested review from rostalan and subhashkhileri March 26, 2026 13:24

rhdh-qodo-merge bot added the Review effort 3/5 label Mar 26, 2026

gustavolira and others added 4 commits March 26, 2026 11:36

style: add explicit return to parallel_check_and_test

102b07c

Fixes SonarCloud S7682 — functions should end with explicit return. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

style: add explicit return 0 to _run_parallel

401898e

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

gustavolira force-pushed the RHIDP-12296 branch from 3abbca0 to 401898e Compare March 26, 2026 14:36

gustavolira changed the title ~~feat(ci): parallelize showcase and showcase-rbac deployment and testing~~ feat(ci): parallelize deployments and readiness checks for showcase and showcase-rbac Mar 26, 2026

subhashkhileri reviewed Mar 27, 2026

View reviewed changes

Conversation

gustavolira commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Design

Changes

Test plan

Uh oh!

github-actions bot commented Mar 26, 2026

Uh oh!

gustavolira commented Mar 26, 2026

Uh oh!

rhdh-qodo-merge bot commented Mar 26, 2026

PR Reviewer Guide 🔍

Uh oh!

github-actions bot commented Mar 26, 2026

Uh oh!

github-actions bot commented Mar 26, 2026

Uh oh!

github-actions bot commented Mar 26, 2026

Uh oh!

github-actions bot commented Mar 26, 2026

Uh oh!

github-actions bot commented Mar 26, 2026

Uh oh!

github-actions bot commented Mar 26, 2026

Uh oh!

subhashkhileri left a comment

Choose a reason for hiding this comment

Uh oh!

subhashkhileri Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

subhashkhileri Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

subhashkhileri Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

subhashkhileri Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

subhashkhileri Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud bot commented Mar 27, 2026

Quality Gate passed

Uh oh!

openshift-ci bot commented Mar 27, 2026

Uh oh!

github-actions bot commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gustavolira commented Mar 26, 2026 •

edited

Loading