CRT Config Monitor for Ship Status#79397
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: thiagoalessio The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
[REHEARSALNOTIFIER] Note: If this PR includes changes to step registry files ( |
WalkthroughThis PR adds monitoring and dashboard configuration for Release Controller and CRT services. Two YAML configuration files are updated with HTTP health check monitors, dashboard components, ownership metadata, and service account associations to enable ship-status visibility and alerting. ChangesShip-status Monitoring and Dashboard Configuration
🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 12✅ Passed checks (12 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@thiagoalessio: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@core-services/ship-status/component-monitor-config.yaml`:
- Around line 558-625: Add a missing monitor entry for the release-controller
sub_component_slug "dpcr-openshift-release": create a block matching the other
release-controller entries (component_slug "release-controller",
sub_component_slug "dpcr-openshift-release") with an http_monitor that points to
the dpcr-openshift-release service (e.g.,
https://dpcr-openshift-release.apps.ci.l2s4.p1.openshiftapps.com), set the
expected code to 200 and retry_after to 5s, and insert it alongside the other
Release Controller entries just before the "END: Release Controller entries"
marker.
- Line 564: Check and confirm whether the shorter retry interval is intentional:
review the Release Controller entries and CRT Services entries that set
"retry_after: 5s" and either (a) change them to match existing Prow monitors'
"retry_after: 4m" if they should follow the same backoff policy, or (b) keep
"retry_after: 5s" but add an inline comment above those entries explaining the
rationale and risk tradeoffs for the 5s interval; update the Release Controller
and CRT Services monitor blocks that currently contain "retry_after: 5s"
accordingly so the intent is explicit.
In `@core-services/ship-status/dashboard-config.yaml`:
- Around line 543-545: Update the incorrect namespace on the component-monitor
service account references: replace occurrences of
"system:serviceaccount:crt-argocd:component-monitor" with
"system:serviceaccount:ship-status:component-monitor" (the entries under the
owners list where the service account is specified) so they point to the actual
service account defined in the ship-status namespace and match other components
in this file.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 8a1f20e2-9dc6-4a81-92f3-3c8ce3dbbd74
📒 Files selected for processing (2)
core-services/ship-status/component-monitor-config.yamlcore-services/ship-status/dashboard-config.yaml
| # BEGIN: Release Controller entries | ||
| - component_slug: "release-controller" | ||
| sub_component_slug: "openshift-release" | ||
| http_monitor: | ||
| url: "https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com" | ||
| code: 200 | ||
| retry_after: 5s | ||
| - component_slug: "release-controller" | ||
| sub_component_slug: "origin-release" | ||
| http_monitor: | ||
| url: "https://origin-release.apps.ci.l2s4.p1.openshiftapps.com" | ||
| code: 200 | ||
| retry_after: 5s | ||
| - component_slug: "release-controller" | ||
| sub_component_slug: "openshift-release-arm64" | ||
| http_monitor: | ||
| url: "https://openshift-release-arm64.apps.ci.l2s4.p1.openshiftapps.com" | ||
| code: 200 | ||
| retry_after: 5s | ||
| - component_slug: "release-controller" | ||
| sub_component_slug: "openshift-release-multi" | ||
| http_monitor: | ||
| url: "https://openshift-release-multi.apps.ci.l2s4.p1.openshiftapps.com" | ||
| code: 200 | ||
| retry_after: 5s | ||
| - component_slug: "release-controller" | ||
| sub_component_slug: "openshift-release-ppc64le" | ||
| http_monitor: | ||
| url: "https://openshift-release-ppc64le.apps.ci.l2s4.p1.openshiftapps.com" | ||
| code: 200 | ||
| retry_after: 5s | ||
| - component_slug: "release-controller" | ||
| sub_component_slug: "openshift-release-s390x" | ||
| http_monitor: | ||
| url: "https://openshift-release-s390x.apps.ci.l2s4.p1.openshiftapps.com" | ||
| code: 200 | ||
| retry_after: 5s | ||
| - component_slug: "release-controller" | ||
| sub_component_slug: "openshift-release-priv" | ||
| http_monitor: | ||
| url: "https://openshift-release-priv.apps.ci.l2s4.p1.openshiftapps.com" | ||
| code: 403 | ||
| retry_after: 5s | ||
| - component_slug: "release-controller" | ||
| sub_component_slug: "openshift-release-arm64-priv" | ||
| http_monitor: | ||
| url: "https://openshift-release-arm64-priv.apps.ci.l2s4.p1.openshiftapps.com" | ||
| code: 403 | ||
| retry_after: 5s | ||
| - component_slug: "release-controller" | ||
| sub_component_slug: "openshift-release-multi-priv" | ||
| http_monitor: | ||
| url: "https://openshift-release-multi-priv.apps.ci.l2s4.p1.openshiftapps.com" | ||
| code: 403 | ||
| retry_after: 5s | ||
| - component_slug: "release-controller" | ||
| sub_component_slug: "openshift-release-ppc64le-priv" | ||
| http_monitor: | ||
| url: "https://openshift-release-ppc64le-priv.apps.ci.l2s4.p1.openshiftapps.com" | ||
| code: 403 | ||
| retry_after: 5s | ||
| - component_slug: "release-controller" | ||
| sub_component_slug: "openshift-release-s390x-priv" | ||
| http_monitor: | ||
| url: "https://openshift-release-s390x-priv.apps.ci.l2s4.p1.openshiftapps.com" | ||
| code: 403 | ||
| retry_after: 5s | ||
| # END: Release Controller entries |
There was a problem hiding this comment.
Missing monitor configuration for dpcr-openshift-release sub-component.
The dashboard configuration (dashboard-config.yaml lines 535-542) includes a dpcr-openshift-release sub-component, but there is no corresponding monitor entry in this file. This will cause the dashboard to display a component without any health monitoring.
📊 Proposed fix: Add monitor for dpcr-openshift-release
Add the following entry after line 624 (before the "END: Release Controller entries" comment):
code: 403
retry_after: 5s
+ - component_slug: "release-controller"
+ sub_component_slug: "dpcr-openshift-release"
+ http_monitor:
+ url: "https://openshift-release.apps.cr.j7t7.p1.openshiftapps.com"
+ code: 200
+ retry_after: 5s
# END: Release Controller entries📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # BEGIN: Release Controller entries | |
| - component_slug: "release-controller" | |
| sub_component_slug: "openshift-release" | |
| http_monitor: | |
| url: "https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com" | |
| code: 200 | |
| retry_after: 5s | |
| - component_slug: "release-controller" | |
| sub_component_slug: "origin-release" | |
| http_monitor: | |
| url: "https://origin-release.apps.ci.l2s4.p1.openshiftapps.com" | |
| code: 200 | |
| retry_after: 5s | |
| - component_slug: "release-controller" | |
| sub_component_slug: "openshift-release-arm64" | |
| http_monitor: | |
| url: "https://openshift-release-arm64.apps.ci.l2s4.p1.openshiftapps.com" | |
| code: 200 | |
| retry_after: 5s | |
| - component_slug: "release-controller" | |
| sub_component_slug: "openshift-release-multi" | |
| http_monitor: | |
| url: "https://openshift-release-multi.apps.ci.l2s4.p1.openshiftapps.com" | |
| code: 200 | |
| retry_after: 5s | |
| - component_slug: "release-controller" | |
| sub_component_slug: "openshift-release-ppc64le" | |
| http_monitor: | |
| url: "https://openshift-release-ppc64le.apps.ci.l2s4.p1.openshiftapps.com" | |
| code: 200 | |
| retry_after: 5s | |
| - component_slug: "release-controller" | |
| sub_component_slug: "openshift-release-s390x" | |
| http_monitor: | |
| url: "https://openshift-release-s390x.apps.ci.l2s4.p1.openshiftapps.com" | |
| code: 200 | |
| retry_after: 5s | |
| - component_slug: "release-controller" | |
| sub_component_slug: "openshift-release-priv" | |
| http_monitor: | |
| url: "https://openshift-release-priv.apps.ci.l2s4.p1.openshiftapps.com" | |
| code: 403 | |
| retry_after: 5s | |
| - component_slug: "release-controller" | |
| sub_component_slug: "openshift-release-arm64-priv" | |
| http_monitor: | |
| url: "https://openshift-release-arm64-priv.apps.ci.l2s4.p1.openshiftapps.com" | |
| code: 403 | |
| retry_after: 5s | |
| - component_slug: "release-controller" | |
| sub_component_slug: "openshift-release-multi-priv" | |
| http_monitor: | |
| url: "https://openshift-release-multi-priv.apps.ci.l2s4.p1.openshiftapps.com" | |
| code: 403 | |
| retry_after: 5s | |
| - component_slug: "release-controller" | |
| sub_component_slug: "openshift-release-ppc64le-priv" | |
| http_monitor: | |
| url: "https://openshift-release-ppc64le-priv.apps.ci.l2s4.p1.openshiftapps.com" | |
| code: 403 | |
| retry_after: 5s | |
| - component_slug: "release-controller" | |
| sub_component_slug: "openshift-release-s390x-priv" | |
| http_monitor: | |
| url: "https://openshift-release-s390x-priv.apps.ci.l2s4.p1.openshiftapps.com" | |
| code: 403 | |
| retry_after: 5s | |
| # END: Release Controller entries | |
| # BEGIN: Release Controller entries | |
| - component_slug: "release-controller" | |
| sub_component_slug: "openshift-release" | |
| http_monitor: | |
| url: "https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com" | |
| code: 200 | |
| retry_after: 5s | |
| - component_slug: "release-controller" | |
| sub_component_slug: "origin-release" | |
| http_monitor: | |
| url: "https://origin-release.apps.ci.l2s4.p1.openshiftapps.com" | |
| code: 200 | |
| retry_after: 5s | |
| - component_slug: "release-controller" | |
| sub_component_slug: "openshift-release-arm64" | |
| http_monitor: | |
| url: "https://openshift-release-arm64.apps.ci.l2s4.p1.openshiftapps.com" | |
| code: 200 | |
| retry_after: 5s | |
| - component_slug: "release-controller" | |
| sub_component_slug: "openshift-release-multi" | |
| http_monitor: | |
| url: "https://openshift-release-multi.apps.ci.l2s4.p1.openshiftapps.com" | |
| code: 200 | |
| retry_after: 5s | |
| - component_slug: "release-controller" | |
| sub_component_slug: "openshift-release-ppc64le" | |
| http_monitor: | |
| url: "https://openshift-release-ppc64le.apps.ci.l2s4.p1.openshiftapps.com" | |
| code: 200 | |
| retry_after: 5s | |
| - component_slug: "release-controller" | |
| sub_component_slug: "openshift-release-s390x" | |
| http_monitor: | |
| url: "https://openshift-release-s390x.apps.ci.l2s4.p1.openshiftapps.com" | |
| code: 200 | |
| retry_after: 5s | |
| - component_slug: "release-controller" | |
| sub_component_slug: "openshift-release-priv" | |
| http_monitor: | |
| url: "https://openshift-release-priv.apps.ci.l2s4.p1.openshiftapps.com" | |
| code: 403 | |
| retry_after: 5s | |
| - component_slug: "release-controller" | |
| sub_component_slug: "openshift-release-arm64-priv" | |
| http_monitor: | |
| url: "https://openshift-release-arm64-priv.apps.ci.l2s4.p1.openshiftapps.com" | |
| code: 403 | |
| retry_after: 5s | |
| - component_slug: "release-controller" | |
| sub_component_slug: "openshift-release-multi-priv" | |
| http_monitor: | |
| url: "https://openshift-release-multi-priv.apps.ci.l2s4.p1.openshiftapps.com" | |
| code: 403 | |
| retry_after: 5s | |
| - component_slug: "release-controller" | |
| sub_component_slug: "openshift-release-ppc64le-priv" | |
| http_monitor: | |
| url: "https://openshift-release-ppc64le-priv.apps.ci.l2s4.p1.openshiftapps.com" | |
| code: 403 | |
| retry_after: 5s | |
| - component_slug: "release-controller" | |
| sub_component_slug: "openshift-release-s390x-priv" | |
| http_monitor: | |
| url: "https://openshift-release-s390x-priv.apps.ci.l2s4.p1.openshiftapps.com" | |
| code: 403 | |
| retry_after: 5s | |
| - component_slug: "release-controller" | |
| sub_component_slug: "dpcr-openshift-release" | |
| http_monitor: | |
| url: "https://openshift-release.apps.cr.j7t7.p1.openshiftapps.com" | |
| code: 200 | |
| retry_after: 5s | |
| # END: Release Controller entries |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@core-services/ship-status/component-monitor-config.yaml` around lines 558 -
625, Add a missing monitor entry for the release-controller sub_component_slug
"dpcr-openshift-release": create a block matching the other release-controller
entries (component_slug "release-controller", sub_component_slug
"dpcr-openshift-release") with an http_monitor that points to the
dpcr-openshift-release service (e.g.,
https://dpcr-openshift-release.apps.ci.l2s4.p1.openshiftapps.com), set the
expected code to 200 and retry_after to 5s, and insert it alongside the other
Release Controller entries just before the "END: Release Controller entries"
marker.
| http_monitor: | ||
| url: "https://openshift-release.apps.ci.l2s4.p1.openshiftapps.com" | ||
| code: 200 | ||
| retry_after: 5s |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check retry_after values across all HTTP monitors
rg -n 'retry_after:' core-services/ship-status/component-monitor-config.yamlRepository: openshift/release
Length of output: 449
🏁 Script executed:
# Check the context around the new retry_after values to see if there are explanatory comments
head -n 650 core-services/ship-status/component-monitor-config.yaml | tail -n 100Repository: openshift/release
Length of output: 3537
🏁 Script executed:
# Check git history for when these lines were added
git log --oneline -p core-services/ship-status/component-monitor-config.yaml | head -n 200Repository: openshift/release
Length of output: 6442
Confirm that retry_after: 5s is intentional for Release Controller and CRT services.
The Release Controller entries (lines 564–624) and CRT Services entries (lines 633–639) all use retry_after: 5s, while existing Prow monitors use retry_after: 4m (lines 8, 109). This 48× difference in retry frequency is consistent across all new entries, suggesting intentional design, but no comments explain the rationale for the difference.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@core-services/ship-status/component-monitor-config.yaml` at line 564, Check
and confirm whether the shorter retry interval is intentional: review the
Release Controller entries and CRT Services entries that set "retry_after: 5s"
and either (a) change them to match existing Prow monitors' "retry_after: 4m" if
they should follow the same backoff policy, or (b) keep "retry_after: 5s" but
add an inline comment above those entries explaining the rationale and risk
tradeoffs for the 5s interval; update the Release Controller and CRT Services
monitor blocks that currently contain "retry_after: 5s" accordingly so the
intent is explicit.
| owners: | ||
| - rover_group: "continuous-release-team" | ||
| - service_account: "system:serviceaccount:crt-argocd:component-monitor" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if the crt-argocd service account exists and compare with referenced component-monitor
# Look for any references to crt-argocd namespace in the codebase
rg -n 'crt-argocd' --type yaml
# Look for component-monitor service account definitions
fd -e yaml -e yml | xargs rg -l 'kind:\s*ServiceAccount' | xargs rg -A5 'name:\s*component-monitor'Repository: openshift/release
Length of output: 3189
🏁 Script executed:
# Search for all service accounts in crt-argocd namespace
rg -A3 'namespace:\s*crt-argocd' --type yaml | rg -B3 'kind:\s*ServiceAccount'
# Also search for any ServiceAccount definitions with crt-argocd context
fd -e yaml -e yml | xargs rg -B5 -A5 'namespace:\s*crt-argocd' | grep -A10 'ServiceAccount'
# Check if crt-argocd namespace is defined anywhere
rg 'namespace.*crt-argocd' --type yaml -B2 -A2Repository: openshift/release
Length of output: 43
🏁 Script executed:
# Search for all references to crt-argocd in YAML files
rg 'crt-argocd' --type yaml -B2 -A2
# Search for all ServiceAccount definitions in the codebase
fd -e yaml -e yml -exec grep -l 'kind:\s*ServiceAccount' {} \; | head -20
# Look for Release Controller and CRT related configs
rg -i 'release.*controller|continuous.*release' --type yaml -B1 -A1 | head -50Repository: openshift/release
Length of output: 4665
🏁 Script executed:
# Search for all ServiceAccount definitions
rg 'kind:\s*ServiceAccount' --type yaml -A2
# Look for all namespaces in serviceaccount.yaml to understand the full context
cat clusters/build-clusters/common/ship-status/serviceaccount.yaml | head -100Repository: openshift/release
Length of output: 50374
🏁 Script executed:
cat clusters/build-clusters/common/ship-status/serviceaccount.yamlRepository: openshift/release
Length of output: 1048
Fix namespace reference for component-monitor service account.
The service account referenced at lines 545 and 568 uses system:serviceaccount:crt-argocd:component-monitor, but this account does not exist in the crt-argocd namespace. The component-monitor service account is defined only in the ship-status namespace (clusters/build-clusters/common/ship-status/serviceaccount.yaml). Update the references to use system:serviceaccount:ship-status:component-monitor to match the actual service account location, consistent with other components in this file.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@core-services/ship-status/dashboard-config.yaml` around lines 543 - 545,
Update the incorrect namespace on the component-monitor service account
references: replace occurrences of
"system:serviceaccount:crt-argocd:component-monitor" with
"system:serviceaccount:ship-status:component-monitor" (the entries under the
owners list where the service account is specified) so they point to the actual
service account defined in the ship-status namespace and match other components
in this file.
Related PR: https://github.com/openshift/continuous-release-jobs/pull/1792
This PR adds monitoring and dashboard configuration for CRT (Continuous Release Team) services in the OpenShift CI infrastructure's Ship Status system.
Changes
Monitor Configuration (
component-monitor-config.yaml):Dashboard Configuration (
dashboard-config.yaml):These changes enable the Ship Status dashboard to display the health and status of CRT services alongside existing infrastructure components, providing visibility into the continuous release infrastructure's operational state.