OSAC-793: add nightly full-installation vmaas E2E with Slack notification#79378
OSAC-793: add nightly full-installation vmaas E2E with Slack notification#79378omer-vishlitzky wants to merge 1 commit into
Conversation
|
@omer-vishlitzky: This pull request references OSAC-867 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository YAML (base), Central YAML (inherited) Review profile: CHILL Plan: Enterprise Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (21)
💤 Files with no reviewable changes (5)
✅ Files skipped from review due to trivial changes (4)
🚧 Files skipped from review as they are similar to previous changes (11)
WalkthroughAdds a nightly OSAC baremetal workflow and supporting step-registry entries: a new all-latest installer step, a remote baremetal test-all step that collects artifacts and component versions, a Slack notify step, OWNER updates, and test infra config with base image mappings and a cron-scheduled nightly job. ChangesNightly E2E Testing Workflow
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 11 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (11 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: omer-vishlitzky The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@ci-operator/step-registry/osac-project/baremetal/test-all/osac-project-baremetal-test-all-ref.yaml`:
- Around line 21-23: The env var documentation for E2E_CLUSTER_TEMPLATE
currently references “CaaS tests” but this step is for vmaas tests; update the
documentation string for E2E_CLUSTER_TEMPLATE (and the other similar entries
around it) to reference “vmaas tests” (or otherwise match the step routing) so
the env var description aligns with the step’s test-type; locate occurrences of
E2E_CLUSTER_TEMPLATE and the adjacent variable docs and replace “CaaS tests”
with “vmaas tests” (or a consistent vmaas-specific phrase).
In
`@ci-operator/step-registry/osac-project/installer/all-latest/osac-project-installer-all-latest-commands.sh`:
- Around line 40-47: The podman run invocation that starts the installer
container does not export required environment variables (FULFILLMENT_IMAGE,
OPERATOR_IMAGE, AAP_IMAGE, and E2E_KUSTOMIZE_OVERLAY) into the container,
causing failures when the in-container script (run with set -u) references them;
update the podman run commands (the block that constructs the container
invocation and the similar block around lines 53-63) to pass these variables via
-e FULFILLMENT_IMAGE=${FULFILLMENT_IMAGE} -e OPERATOR_IMAGE=${OPERATOR_IMAGE} -e
AAP_IMAGE=${AAP_IMAGE} and ensure E2E_KUSTOMIZE_OVERLAY is also exported with -e
E2E_KUSTOMIZE_OVERLAY=${E2E_KUSTOMIZE_OVERLAY} so the installer script can read
them inside the container.
- Around line 21-39: The remote SSH heredoc invoked by the ssh command (ssh -F
"${SHARED_DIR}/ssh_config" ci_machine bash - << EOF|& sed ...) does not enable
strict mode; insert a single line with set -euo pipefail as the first command
inside that heredoc (immediately after the heredoc start and before export
KUBECONFIG) so that commands like export KUBECONFIG, oc annotate sc lvms-vg1, oc
wait --for=condition=Available and the subsequent cat <<NADEOF block will fail
fast on errors, unset variables, or pipeline failures.
In
`@ci-operator/step-registry/osac-project/notify/osac-project-notify-commands.sh`:
- Around line 38-40: The curl POST currently uses "curl -s" which neither fails
on non-2xx nor has timeouts; update the curl invocation that posts "${MESSAGE}"
to "${WEBHOOK_URL}" to use error-aware and bounded options (e.g. add --fail and
--show-error and sensible timeouts like --connect-timeout and --max-time) and
ensure the script checks curl's exit status and exits non‑zero on failure so the
CI step fails fast on notification errors.
- Around line 3-4: The script currently enables nounset and pipefail via the
lines "set -o nounset" and "set -o pipefail" but omits -e; update the top of the
step script so it uses full strict mode by replacing those with a single strict
set invocation (enable -e, -u and pipefail) so command failures cannot be
ignored; ensure this change is placed at the beginning of the
osac-project-notify-commands.sh script where "set -o nounset" and "set -o
pipefail" currently appear.
- Around line 33-40: The current code builds MESSAGE via shell string
interpolation and embeds it directly into the curl --data JSON string, which can
break if MESSAGE contains quotes or other JSON-sensitive characters; update the
notify step to JSON-encode the payload instead of manual interpolation by
replacing the inline --data "{\"text\":\"${MESSAGE}\"}" pattern with a safe
encoder (e.g., use jq, python -c 'import json,sys; print(json.dumps({...}))', or
printf + jq) to produce a properly escaped JSON object and then pass that
encoded payload to curl (keeping the existing MESSAGE, WEBHOOK_URL and curl
invocation but feeding curl the encoded JSON via --data `@-` or a temp file).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: d1c7d8e7-65de-4640-9cf4-99eeb069d3d1
⛔ Files ignored due to path filters (1)
ci-operator/jobs/osac-project/osac-test-infra/osac-project-osac-test-infra-main-periodics.yamlis excluded by!ci-operator/jobs/**
📒 Files selected for processing (21)
ci-operator/config/osac-project/osac-test-infra/osac-project-osac-test-infra-main.yamlci-operator/step-registry/osac-project/baremetal/test-all/OWNERSci-operator/step-registry/osac-project/baremetal/test-all/osac-project-baremetal-test-all-commands.shci-operator/step-registry/osac-project/baremetal/test-all/osac-project-baremetal-test-all-ref.metadata.jsonci-operator/step-registry/osac-project/baremetal/test-all/osac-project-baremetal-test-all-ref.yamlci-operator/step-registry/osac-project/installer/all-latest/OWNERSci-operator/step-registry/osac-project/installer/all-latest/osac-project-installer-all-latest-commands.shci-operator/step-registry/osac-project/installer/all-latest/osac-project-installer-all-latest-ref.metadata.jsonci-operator/step-registry/osac-project/installer/all-latest/osac-project-installer-all-latest-ref.yamlci-operator/step-registry/osac-project/installer/component/OWNERSci-operator/step-registry/osac-project/installer/component/osac-project-installer-component-commands.shci-operator/step-registry/osac-project/notify/OWNERSci-operator/step-registry/osac-project/notify/osac-project-notify-commands.shci-operator/step-registry/osac-project/notify/osac-project-notify-ref.metadata.jsonci-operator/step-registry/osac-project/notify/osac-project-notify-ref.yamlci-operator/step-registry/osac-project/ofcir/baremetal/component/OWNERSci-operator/step-registry/osac-project/ofcir/baremetal/component/osac-project-ofcir-baremetal-component-workflow.metadata.jsonci-operator/step-registry/osac-project/ofcir/baremetal/component/osac-project-ofcir-baremetal-component-workflow.yamlci-operator/step-registry/osac-project/ofcir/baremetal/nightly/OWNERSci-operator/step-registry/osac-project/ofcir/baremetal/nightly/osac-project-ofcir-baremetal-nightly-workflow.metadata.jsonci-operator/step-registry/osac-project/ofcir/baremetal/nightly/osac-project-ofcir-baremetal-nightly-workflow.yaml
💤 Files with no reviewable changes (5)
- ci-operator/step-registry/osac-project/ofcir/baremetal/component/osac-project-ofcir-baremetal-component-workflow.metadata.json
- ci-operator/step-registry/osac-project/installer/component/OWNERS
- ci-operator/step-registry/osac-project/ofcir/baremetal/component/OWNERS
- ci-operator/step-registry/osac-project/ofcir/baremetal/component/osac-project-ofcir-baremetal-component-workflow.yaml
- ci-operator/step-registry/osac-project/installer/component/osac-project-installer-component-commands.sh
| - name: E2E_CLUSTER_TEMPLATE | ||
| default: "osac.templates.ocp_4_17_small" | ||
| documentation: The cluster template to use for CaaS tests |
There was a problem hiding this comment.
Fix test-type mismatch in env var documentation.
Line 23 says E2E_CLUSTER_TEMPLATE is for “CaaS tests”, but this step is documented/routed as vmaas tests. Please align wording to avoid operator confusion.
Also applies to: 25-26
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@ci-operator/step-registry/osac-project/baremetal/test-all/osac-project-baremetal-test-all-ref.yaml`
around lines 21 - 23, The env var documentation for E2E_CLUSTER_TEMPLATE
currently references “CaaS tests” but this step is for vmaas tests; update the
documentation string for E2E_CLUSTER_TEMPLATE (and the other similar entries
around it) to reference “vmaas tests” (or otherwise match the step routing) so
the env var description aligns with the step’s test-type; locate occurrences of
E2E_CLUSTER_TEMPLATE and the adjacent variable docs and replace “CaaS tests”
with “vmaas tests” (or a consistent vmaas-specific phrase).
| podman run --authfile /root/pull-secret --rm --network=host \ | ||
| -v \${KUBECONFIG}:/root/.kube/config:z \ | ||
| -v /root/pull-secret:/installer/overlays/${E2E_KUSTOMIZE_OVERLAY}/files/quay-pull-secret.json:z \ | ||
| -v /tmp/license.zip:/installer/overlays/${E2E_KUSTOMIZE_OVERLAY}/files/license.zip:z \ | ||
| -e INSTALLER_NAMESPACE=${E2E_NAMESPACE} \ | ||
| -e INSTALLER_KUSTOMIZE_OVERLAY=${E2E_KUSTOMIZE_OVERLAY} \ | ||
| -e INSTALLER_VM_TEMPLATE=${E2E_VM_TEMPLATE} \ | ||
| ${OSAC_INSTALLER_IMAGE} sh -c ' |
There was a problem hiding this comment.
Pass all referenced override vars into the installer container.
The in-container script uses FULFILLMENT_IMAGE, OPERATOR_IMAGE, AAP_IMAGE, and E2E_KUSTOMIZE_OVERLAY, but they are not passed via podman run -e. With set -u enabled inside the container, this will fail before setup completes.
Proposed fix
podman run --authfile /root/pull-secret --rm --network=host \
-v ${KUBECONFIG}:/root/.kube/config:z \
-v /root/pull-secret:/installer/overlays/${E2E_KUSTOMIZE_OVERLAY}/files/quay-pull-secret.json:z \
-v /tmp/license.zip:/installer/overlays/${E2E_KUSTOMIZE_OVERLAY}/files/license.zip:z \
-e INSTALLER_NAMESPACE=${E2E_NAMESPACE} \
-e INSTALLER_KUSTOMIZE_OVERLAY=${E2E_KUSTOMIZE_OVERLAY} \
-e INSTALLER_VM_TEMPLATE=${E2E_VM_TEMPLATE} \
+-e FULFILLMENT_IMAGE=${FULFILLMENT_IMAGE} \
+-e OPERATOR_IMAGE=${OPERATOR_IMAGE} \
+-e AAP_IMAGE=${AAP_IMAGE} \
+-e E2E_KUSTOMIZE_OVERLAY=${E2E_KUSTOMIZE_OVERLAY} \
${OSAC_INSTALLER_IMAGE} sh -c 'Also applies to: 53-63
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@ci-operator/step-registry/osac-project/installer/all-latest/osac-project-installer-all-latest-commands.sh`
around lines 40 - 47, The podman run invocation that starts the installer
container does not export required environment variables (FULFILLMENT_IMAGE,
OPERATOR_IMAGE, AAP_IMAGE, and E2E_KUSTOMIZE_OVERLAY) into the container,
causing failures when the in-container script (run with set -u) references them;
update the podman run commands (the block that constructs the container
invocation and the similar block around lines 53-63) to pass these variables via
-e FULFILLMENT_IMAGE=${FULFILLMENT_IMAGE} -e OPERATOR_IMAGE=${OPERATOR_IMAGE} -e
AAP_IMAGE=${AAP_IMAGE} and ensure E2E_KUSTOMIZE_OVERLAY is also exported with -e
E2E_KUSTOMIZE_OVERLAY=${E2E_KUSTOMIZE_OVERLAY} so the installer script can read
them inside the container.
| MESSAGE="${EMOJI} *${JOB_NAME}* — ${RESULT}\n<${JOB_URL}|View logs>" | ||
| if [[ -n "${VERSIONS}" ]]; then | ||
| MESSAGE="${MESSAGE}\n\n*Versions:*\n\`\`\`${VERSIONS}\n\`\`\`" | ||
| fi | ||
|
|
||
| curl -s -X POST -H 'Content-type: application/json' \ | ||
| --data "{\"text\":\"${MESSAGE}\"}" \ | ||
| "${WEBHOOK_URL}" |
There was a problem hiding this comment.
Avoid manual JSON interpolation for Slack payload text.
MESSAGE content can include characters that break JSON encoding; build payload with a JSON encoder instead of string interpolation.
Suggested fix
-MESSAGE="${EMOJI} *${JOB_NAME}* — ${RESULT}\n<${JOB_URL}|View logs>"
+MESSAGE="${EMOJI} *${JOB_NAME}* — ${RESULT}\n<${JOB_URL}|View logs>"
if [[ -n "${VERSIONS}" ]]; then
MESSAGE="${MESSAGE}\n\n*Versions:*\n\`\`\`${VERSIONS}\n\`\`\`"
fi
-curl -s -X POST -H 'Content-type: application/json' \
- --data "{\"text\":\"${MESSAGE}\"}" \
- "${WEBHOOK_URL}"
+payload="$(jq -n --arg text "${MESSAGE}" '{text: $text}')"
+curl --fail --show-error --silent \
+ --connect-timeout 10 --max-time 30 \
+ -X POST -H 'Content-type: application/json' \
+ --data "${payload}" \
+ "${WEBHOOK_URL}"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@ci-operator/step-registry/osac-project/notify/osac-project-notify-commands.sh`
around lines 33 - 40, The current code builds MESSAGE via shell string
interpolation and embeds it directly into the curl --data JSON string, which can
break if MESSAGE contains quotes or other JSON-sensitive characters; update the
notify step to JSON-encode the payload instead of manual interpolation by
replacing the inline --data "{\"text\":\"${MESSAGE}\"}" pattern with a safe
encoder (e.g., use jq, python -c 'import json,sys; print(json.dumps({...}))', or
printf + jq) to produce a properly escaped JSON object and then pass that
encoded payload to curl (keeping the existing MESSAGE, WEBHOOK_URL and curl
invocation but feeding curl the encoded JSON via --data `@-` or a temp file).
f2407dd to
7e33c12
Compare
|
@omer-vishlitzky: This pull request references OSAC-793 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/jira-refresh |
…tion Add a nightly periodic job that installs OSAC with ALL component images overridden to their latest CI-built versions, runs all vmaas tests, and sends results to Slack with pass/fail status and component versions. New step-registry components: - osac-project-installer-all-latest: installs OSAC with fulfillment-service, osac-operator, and osac-aap images all swapped to latest - osac-project-baremetal-test-all: runs make test-vmaas (all tests) - osac-project-ofcir-baremetal-nightly: dedicated nightly workflow - osac-project-notify: Slack webhook notification (from PR openshift#79364) Removes unused dead code: - osac-project-installer-component: single-image swap, replaced by all-latest - osac-project-ofcir-baremetal-component: workflow using installer-component
7e33c12 to
a21d2cb
Compare
|
[REHEARSALNOTIFIER]
Interacting with pj-rehearseComment: Once you are satisfied with the results of the rehearsals, comment: |
|
/pj-rehearse periodic-ci-osac-project-osac-test-infra-main-nightly-e2e-vmaas |
|
@omer-vishlitzky: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/pj-rehearse periodic-ci-osac-project-osac-test-infra-main-nightly-e2e-vmaas |
|
@omer-vishlitzky: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
@omer-vishlitzky: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary
Adds a nightly periodic job that installs OSAC with all component images
overridden to their latest CI-built versions, runs all vmaas E2E tests, and
sends results to Slack.
Unlike the existing 8 periodic jobs (which run one test each with pinned
installer versions), this job:
CI images before installation
make test-vmaasNew step-registry components
osac-project-installer-all-latestosac-project-baremetal-test-allmake test-vmaas(all tests), reports versionsosac-project-ofcir-baremetal-nightlyosac-project-notifyRemoved dead code
osac-project-installer-component— single-image swap, replaced by all-latestosac-project-ofcir-baremetal-component— workflow that used it, unused by any configFlow
Cluster specs
SNO with 64 GB RAM, 2x200 GB disks, 24 vCPUs, CNV + LVM, OCP 4.20.
Runs nightly at 3am UTC.
Test plan
/test nightly-e2e-vmaasSummary
This PR updates OpenShift CI configuration in openshift/release to add a nightly full-installation vmaas E2E job for the OSAC project. The job installs OSAC with multiple components overridden to their latest CI-built images, runs the full vmaas test suite, gathers artifacts, and posts a Slack notification containing pass/fail, deployed component versions, and a Prow logs link.
What changed (practical terms)
New CI step-registry components
Removed / consolidated items
Cluster and runtime configuration
Notifications & reporting
Verification / test plan
Bot feedback / notes