feat(ci): parallelize deployments and readiness checks for showcase and showcase-rbac#4469
feat(ci): parallelize deployments and readiness checks for showcase and showcase-rbac#4469gustavolira wants to merge 7 commits intoredhat-developer:mainfrom
Conversation
|
The container image build workflow finished with status: |
|
/review |
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
|
The container image build workflow finished with status: |
|
The container image build workflow finished with status: |
Run base and RBAC deployments concurrently using background processes with proper error reporting for both. Parallelize Playwright test execution with a shared Xvfb display and per-project output directories to avoid race conditions on test-results, junit XML, and HTML reports. Reduces PR feedback and nightly test-phase time by ~3-5 min by overlapping both Backstage readiness waits and test suites. RHIDP-12296 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The --previous log path was still hardcoded to pod_logs/ instead of using the parallel-safe _POD_LOGS_DIR variable, causing previous logs to silently fail when save_all_pod_logs uses a /tmp/ directory. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fixes SonarCloud S7682 — functions should end with explicit return. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3abbca0 to
401898e
Compare
|
The container image build workflow finished with status: |
Parallel Playwright execution caused OOMKilled in CI (exit code 137). Refactor testing::parallel_check_and_test to: - Phase 1: run readiness checks in parallel (lightweight HTTP polling) - Phase 2: run test suites sequentially (avoids concurrent Playwright OOM) This still saves ~5 min from overlapping readiness waits while staying within the CI container memory limits. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The playwright-report files (index.html, data/, trace/) were being saved inside a playwright-report/ subdirectory instead of at the root of the artifact dir. Remove the third arg from save_artifact so the per-project report directory contents are copied directly into the artifact root, matching the original layout. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
subhashkhileri
left a comment
There was a problem hiding this comment.
Reviewed the parallelization changes. The overall design is sound — parallel readiness checks are a good win, and per-project artifact paths fix a real overwrite bug. A few suggestions below.
.ci/pipelines/lib/testing.sh
Outdated
| # Phase 1: Wait for both deployments to be ready in parallel (lightweight HTTP polling) | ||
| log::section "Waiting for both deployments to be ready in parallel" | ||
|
|
||
| testing::check_backstage_running "${base_release}" "${base_namespace}" "${base_url}" "${base_artifacts}" & |
There was a problem hiding this comment.
Bug: $LOGFILE race condition when running in background
Both background readiness checks will use the shared global $LOGFILE (test-log). Inside check_backstage_running() (lines 279, 289), on crash-detection or timeout it writes:
common::save_artifact "${artifacts_subdir}" "/tmp/${LOGFILE}" || trueSince both background processes share the same $LOGFILE, they'll clobber each other's log if both hit the failure/timeout path.
run_tests() was correctly updated to use per-project logfiles — but check_backstage_running() was missed.
Suggest updating those lines to use a per-project path:
common::save_artifact "${artifacts_subdir}" "/tmp/${LOGFILE}-${artifacts_subdir}" || true
.ci/pipelines/utils.sh
Outdated
| _run_parallel() { | ||
| local label=$1 | ||
| local base_func=$2 | ||
| local base_arg=$3 |
There was a problem hiding this comment.
Suggestion: Inline instead of abstracting
This helper is called exactly twice (initiate_deployments and initiate_deployments_osd_gcp) and only supports single-argument functions. The generic naming/signature suggests reusability, but it's tightly constrained. Consider inlining the &/wait pattern directly in the two callers — it's ~6 lines and immediately readable:
initiate_deployments() {
local base_artifacts_subdir=$1
local rbac_artifacts_subdir=$2
cd "${DIR}"
base_deployment "${base_artifacts_subdir}" &
local base_pid=$!
rbac_deployment "${rbac_artifacts_subdir}" &
local rbac_pid=$!
local failed=0
wait "${base_pid}" || { log::error "Base deployment failed"; failed=1; }
wait "${rbac_pid}" || { log::error "RBAC deployment failed"; failed=1; }
return "${failed}"
}No indirection, no need to look up what _run_parallel does.
.ci/pipelines/lib/testing.sh
Outdated
| # Args: | ||
| # $1-$5: release_name, namespace, playwright_project, url, artifacts_subdir | ||
| # $6: readiness check exit code (0 = ready) | ||
| _run_check_and_test_phase() { |
There was a problem hiding this comment.
Duplication with testing::check_and_test()
This function duplicates the logic already in testing::check_and_test() (lines 307-341) — same pod display, same SKIP_TESTS check, same run_tests/save_all_pod_logs branching, same mark_deploy_failed on failure.
Consider reusing check_and_test instead. After the parallel readiness wait completes, check_backstage_running inside check_and_test will return immediately on the first curl (HTTP 200 in ~2 seconds since the deployment is already up). The re-check cost is negligible, and you avoid maintaining two copies of the same logic:
# In ocp-pull.sh — parallel readiness, then reuse existing check_and_test
testing::check_backstage_running "..." "..." "${url}" "..." &
local base_pid=$!
testing::check_backstage_running "..." "..." "${rbac_url}" "..." &
local rbac_pid=$!
wait "${base_pid}"; wait "${rbac_pid}"
testing::check_and_test "${RELEASE_NAME}" "${NAME_SPACE}" "${PW_PROJECT_SHOWCASE}" "${url}"
testing::check_and_test "${RELEASE_NAME_RBAC}" "${NAME_SPACE_RBAC}" "${PW_PROJECT_SHOWCASE_RBAC}" "${rbac_url}"No new functions needed.
.ci/pipelines/utils.sh
Outdated
| rm -rf pod_logs && mkdir -p pod_logs | ||
|
|
||
| # Use a unique temp directory per artifacts_subdir to allow parallel pod log collection | ||
| export _POD_LOGS_DIR="/tmp/pod_logs-${artifacts_subdir}" |
There was a problem hiding this comment.
Code smell: exported global to pass a directory path
Using export _POD_LOGS_DIR + unset to communicate with retrieve_pod_logs is fragile. If save_all_pod_logs is ever called concurrently (which this PR is moving toward), the global gets clobbered between calls.
Consider passing it as a function parameter instead:
retrieve_pod_logs "$pod_name" "$container" "$namespace" "$pod_logs_dir"This eliminates the global state, the unset cleanup, and the ${_POD_LOGS_DIR:-pod_logs} fallback pattern in retrieve_pod_logs.
| # Args: pairs of (release_name namespace playwright_project url) for base and RBAC | ||
| # $1-$4: base args | ||
| # $5-$8: RBAC args | ||
| testing::parallel_check_and_test() { |
There was a problem hiding this comment.
Nit: Hardcoded for exactly 2 deployments
This function takes 8 positional args (4 per deployment) and is hardcoded to the base+RBAC pair. This is fine for now, but a brief comment like # Hardcoded for the base + RBAC deployment pair would clarify that the rigid 8-arg signature is intentional and not meant to be extended to N deployments.
…ation - Inline _run_parallel() into both callers (review #2) - Remove _run_check_and_test_phase, reuse check_and_test directly after parallel readiness pre-warm — re-check is instant if already up (review redhat-developer#3) - Pass pod_logs_dir as function parameter instead of exported global (review redhat-developer#4) - Fix $LOGFILE race in check_backstage_running for parallel execution (review #1) - Add comment clarifying hardcoded base+RBAC pair (review redhat-developer#5) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
|
@gustavolira: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |



Summary
Design
Changes
Parallel deployments (
utils.sh):initiate_deployments()andinitiate_deployments_osd_gcp()run base + RBAC concurrently with&+waitParallel readiness + sequential tests (
testing.sh):testing::parallel_check_and_test()pre-warms readiness checks in parallel, then callstesting::check_and_testsequentially for each deploymenttesting::run_tests()uses per-project output paths:test-results-<subdir>/,junit-results-<subdir>.xml,playwright-report-<subdir>/, per-project log filestesting::check_backstage_running()uses per-project log file path to avoid race conditionsPod log collection (
utils.sh):save_all_pod_logs()passes pod logs directory as function parameter (not exported global)/tmp/pod_logs-<subdir>/per artifacts subdirCallers (
ocp-pull.sh,ocp-nightly.sh):testing::parallel_check_and_test()Test plan
Resolves: RHIDP-12296
🤖 Generated with Claude Code