-
Notifications
You must be signed in to change notification settings - Fork 428
OCPBUGS-69866: skip HCP router LB when routes use apps domain #7416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: release-4.19
Are you sure you want to change the base?
OCPBUGS-69866: skip HCP router LB when routes use apps domain #7416
Conversation
|
@celebdor: This pull request references Jira Issue OCPBUGS-69866, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Important Review skippedAuto reviews are limited based on label configuration. 🚫 Review skipped — only excluded labels are configured. (1)
Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThis change introduces domain-aware DNS routing logic for hosted control planes. It adds utility functions to validate subdomain relationships and determine external DNS usage, enabling the router service creation to trigger only for hostnames outside the management cluster's default ingress domain rather than all public routes. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
✨ Finishing touches🧪 Generate unit tests (beta)
Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go (1)
1947-1985: Align external-router infra status with new IsPublicWithExternalDNS gating
reconcileHCPRouterServicesnow only creates the publicrouterLoadBalancer service whenutil.IsPublicWithExternalDNS(hcp, r.DefaultIngressDomain)is true, butreconcileExternalRouterServiceStatusstill gates onutil.IsPublicWithDNS(hcp). For public HCPs whose Route hostnames are all under the management apps domain:
IsPublicWithDNS(hcp)remains true, soreconcileExternalRouterServiceStatusreportsneeded == trueeven though no public router service is ever created.reconcileRouterServiceStatusthen runs against a non-existent Service, returningneeded == truewith an empty host.If
infra.InfrastructureStatus.IsReady()depends onNeedExternalRouterimplying an actual external router host, this divergence can keep infrastructure perpetually “not ready” even though we intentionally skipped creating a public router LB for apps-domain routes, undermining the goal of this PR.Consider gating the external router status on the same predicate used for LB creation:
Proposed change for external-router status gating
func (r *HostedControlPlaneReconciler) reconcileExternalRouterServiceStatus(ctx context.Context, hcp *hyperv1.HostedControlPlane) (host string, needed bool, message string, err error) { - if !util.IsPublicWithDNS(hcp) || sharedingress.UseSharedIngress() || hcp.Spec.Platform.Type == hyperv1.IBMCloudPlatform { + if !util.IsPublicWithExternalDNS(hcp, r.DefaultIngressDomain) || sharedingress.UseSharedIngress() || hcp.Spec.Platform.Type == hyperv1.IBMCloudPlatform { return } return r.reconcileRouterServiceStatus(ctx, manifests.RouterPublicService(hcp.Namespace), events.NewMessageCollector(ctx, r.Client)) }This keeps infra status expectations consistent with when the public router LB is actually reconciled.
Also applies to: 2087-2092
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (7)
control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go(2 hunks)control-plane-operator/controllers/hostedcontrolplane/v2/router/component.go(1 hunks)support/controlplane-component/controlplane-component.go(3 hunks)support/util/expose.go(3 hunks)support/util/expose_test.go(1 hunks)support/util/visibility.go(1 hunks)support/util/visibility_test.go(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.
Files:
support/controlplane-component/controlplane-component.gocontrol-plane-operator/controllers/hostedcontrolplane/v2/router/component.gosupport/util/expose_test.gosupport/util/expose.gosupport/util/visibility.gosupport/util/visibility_test.gocontrol-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go
🧬 Code graph analysis (6)
control-plane-operator/controllers/hostedcontrolplane/v2/router/component.go (3)
support/controlplane-component/controlplane-component.go (1)
WorkloadContext(68-87)hypershift-operator/controllers/sharedingress/sharedingress_controller.go (1)
UseSharedIngress(227-230)support/util/visibility.go (2)
IsPrivateHCP(9-14)IsPublicWithExternalDNS(51-56)
support/util/expose_test.go (3)
support/util/expose.go (2)
IsSubdomain(60-78)UseDedicatedDNSWithDomain(84-91)api/hypershift/v1beta1/hostedcluster_types.go (5)
ServiceType(845-845)OAuthServer(855-855)Route(836-836)APIServer(849-849)LoadBalancer(832-832)hypershift-operator/controllers/sharedingress/sharedingress_controller.go (1)
Hostname(232-238)
support/util/expose.go (2)
api/hypershift/v1beta1/hostedcluster_types.go (2)
ServiceType(845-845)Route(836-836)hypershift-operator/controllers/sharedingress/sharedingress_controller.go (1)
Hostname(232-238)
support/util/visibility.go (2)
support/util/expose.go (1)
UseDedicatedDNSWithDomain(84-91)api/hypershift/v1beta1/hostedcluster_types.go (4)
APIServer(849-849)OAuthServer(855-855)Konnectivity(852-852)Ignition(858-858)
support/util/visibility_test.go (2)
hypershift-operator/controllers/sharedingress/sharedingress_controller.go (1)
Hostname(232-238)support/util/visibility.go (1)
IsPublicWithExternalDNS(51-56)
control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go (1)
support/util/visibility.go (1)
IsPublicWithExternalDNS(51-56)
🔇 Additional comments (10)
support/controlplane-component/controlplane-component.go (1)
58-66: DefaultIngressDomain propagation into workload context looks correctAdding
DefaultIngressDomainto bothControlPlaneContextandWorkloadContextand wiring it throughworkloadContext()cleanly exposes the management cluster ingress domain to v2 components without affecting existing behavior when the field is unset.Also applies to: 82-87, 89-103
control-plane-operator/controllers/hostedcontrolplane/v2/router/component.go (1)
49-62: v2 router predicate correctly narrows public LB usage to external-DNS hostnamesUsing
util.IsPrivateHCP(cpContext.HCP) || util.IsPublicWithExternalDNS(cpContext.HCP, cpContext.DefaultIngressDomain)aligns the router component with the new “apps-domain vs external-domain” semantics while still honoring shared-ingress and private-HCP behavior.support/util/visibility_test.go (1)
89-238: IsPublicWithExternalDNS tests cover key domain and platform scenariosThe table-driven
TestIsPublicWithExternalDNSexercises external vs apps-domain hostnames, mixed cases, private AWS, and no-hostname routes, matching the intended semantics ofIsPublicWithExternalDNSand giving solid regression coverage.control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go (1)
1047-1060: Passing DefaultIngressDomain into CPO v2 components is appropriatePropagating
r.DefaultIngressDomainintocomponent.ControlPlaneContextensures v2 components (like the router) can make domain-aware decisions without altering existing callers. IfDefaultIngressDomaincan legitimately be empty on some management clusters, please confirmIsPublicWithExternalDNS/UseDedicatedDNSWithDomainhandle that case as intended (e.g., whether an empty domain should cause all hostnames to be treated as “external” or not).support/util/visibility.go (1)
47-56: IsPublicWithExternalDNS provides a clean domain-aware variant of IsPublicWithDNSThe new
IsPublicWithExternalDNShelper correctly reusesIsPublicHCPandUseDedicatedDNSWithDomainacross all relevant service types, matching the documented intent of only treating hostnames outside the management cluster’s default ingress domain as requiring a dedicated HCP router LoadBalancer.support/util/expose_test.go (2)
9-84: Excellent test coverage forIsSubdomain.The test cases comprehensively cover:
- Valid subdomain relationships (single and multi-level)
- Boundary conditions (exact match, empty inputs, shorter hostname)
- Edge cases (partial label matches, suffix matches, case sensitivity)
The test structure follows Go best practices with clear test names and expected outcomes.
86-209: Thorough test coverage forUseDedicatedDNSWithDomain.The test cases validate the core logic for determining when a dedicated DNS router is needed:
- Routes with hostnames under the apps domain correctly return
false(no dedicated router needed)- Routes with external hostnames correctly return
true(dedicated router needed)- Edge cases (empty hostname, LoadBalancer type, empty domain, missing service) are properly handled
This aligns well with the PR objective to prevent unnecessary router LoadBalancer creation for apps-domain routes.
support/util/expose.go (3)
56-78: Clean implementation of subdomain validation.The function correctly:
- Handles empty inputs by returning
false- Performs case-insensitive comparison using
strings.ToLower- Validates proper subdomain relationship (hostname must have more labels than domain)
- Compares labels from right to left, ensuring full domain suffix match
The documentation clearly explains the expected behavior with an example.
80-91: Correct implementation of domain-aware DNS routing logic.The function properly implements the logic to determine when a dedicated HCP router is needed:
- Returns
falsefor non-Route services, nil Route, or empty hostname (correctly avoiding unnecessary router creation)- Returns
trueonly when the route hostname is external to the management cluster's default ingress domain- Short-circuit evaluation (
||) safely prevents nil pointer accessThis aligns perfectly with the PR objective: preventing unnecessary router LoadBalancer creation when routes use the apps domain.
124-133: Consistent implementation for HostedCluster.The function mirrors
UseDedicatedDNSWithDomainbut operates onHostedClusterinstead ofHostedControlPlane. The implementation follows the same pattern:
- Uses
IsRouteByHCandServicePublishingStrategyByTypeByHC(HC-specific helpers)- Applies the same domain-aware logic via
IsSubdomainThis consistency ensures uniform behavior across both HCP and HC code paths.
c5320f9 to
4963a89
Compare
|
Thanks for the review feedback @coderabbitai! I've addressed your concern about the inconsistency between Changes made:
This keeps infra status expectations consistent with when the public router LB is actually reconciled. AI-assisted response via Claude Code |
|
Tip For best results, initiate chat on the files or code changes. @celebdor: Thank you for addressing the consistency concern! The changes look good: ✅ Both The consistency across these components is important for avoiding the exact issue this PR fixes - ensuring that when we decide not to create a router LoadBalancer service (because routes use the apps domain), we don't report expectations for it in the infrastructure status either. |
| // Otherwise, the Routes use the management cluster Domain and resolve through the default ingress controller. | ||
| // Note: For public HCPs, we only need the HCP router if routes use hostnames external to the management cluster's | ||
| // apps domain. Hostnames under the apps domain are served by the management cluster's default router. | ||
| func useHCPRouter(cpContext component.WorkloadContext) (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's export this function and reuse it in hostedcontrolpl;ane_contgroller.go instead of duplicating it twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you would need to change the parameters to take hcp and ingressDomain directly,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Created shared util.UseHCPRouter(hcp, defaultIngressDomain, useSharedIngress) function in support/util/visibility.go and updated both v2 router component and v1 controller to use it.
AI-assisted response via Claude Code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. The shared function takes hcp, defaultIngressDomain, and useSharedIngress as parameters.
AI-assisted response via Claude Code
| } | ||
|
|
||
| func (r *HostedControlPlaneReconciler) reconcileHCPRouterServices(ctx context.Context, hcp *hyperv1.HostedControlPlane, createOrUpdate upsert.CreateOrUpdateFN) error { | ||
| if sharedingress.UseSharedIngress() || hcp.Spec.Platform.Type == hyperv1.IBMCloudPlatform { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we please replace those inline checks with UseHCPRouter() func for consistency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Updated reconcileHCPRouterServices to use util.UseHCPRouter() for the entry guard check, consolidating the logic.
AI-assisted response via Claude Code
support/util/expose.go
Outdated
| // UseDedicatedDNSByHCWithDomain checks if a service uses external DNS that requires a dedicated HCP router. | ||
| // It returns true only if the service uses a Route with a hostname that is NOT a subdomain of the | ||
| // management cluster's default ingress domain. | ||
| func UseDedicatedDNSByHCWithDomain(hc *hyperv1.HostedCluster, svcType hyperv1.ServiceType, defaultIngressDomain string) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| func UseDedicatedDNSByHCWithDomain(hc *hyperv1.HostedCluster, svcType hyperv1.ServiceType, defaultIngressDomain string) bool { | |
| func UseDedicatedDNSByHCWithExternalDomain(hc *hyperv1.HostedCluster, svcType hyperv1.ServiceType, defaultIngressDomain string) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Renamed to UseDedicatedDNSByHCWithExternalDomain.
AI-assisted response via Claude Code
4963a89 to
ccec1fa
Compare
| // useHCPRouterPredicate is the component predicate wrapper for util.UseHCPRouter. | ||
| // See util.UseHCPRouter for full documentation on when a dedicated HCP router is needed. | ||
| func useHCPRouterPredicate(cpContext component.WorkloadContext) (bool, error) { | ||
| return util.UseHCPRouter(cpContext.HCP, cpContext.DefaultIngressDomain, sharedingress.UseSharedIngress()), nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer if we keep the exported function in this package. This should container all router related functionalities
b15e3bc to
b88f9a1
Compare
| @@ -0,0 +1,143 @@ | |||
| # OCPBUGS-69866: HCP cluster upgrade to 4.19 creates unexpected router LoadBalancer service | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this file needs to be removed from the PR
support/util/expose.go
Outdated
| // It returns true only if the service uses a Route with a hostname that is NOT a subdomain of the | ||
| // management cluster's default ingress domain. Hostnames under the apps domain are served by the | ||
| // management cluster's default router and don't require a separate HCP router LoadBalancer service. | ||
| func UseDedicatedDNSWithDomain(hcp *hyperv1.HostedControlPlane, svcType hyperv1.ServiceType, defaultIngressDomain string) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| func UseDedicatedDNSWithDomain(hcp *hyperv1.HostedControlPlane, svcType hyperv1.ServiceType, defaultIngressDomain string) bool { | |
| func UseDedicatedDNSWithExternalDomain(hcp *hyperv1.HostedControlPlane, svcType hyperv1.ServiceType, defaultIngressDomain string) bool { |
thought the AI would be smart enough to change both :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename done. Thanks!
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: celebdor, muraee The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
When upgrading HCP clusters from 4.18 to 4.19, the router LoadBalancer service was created unexpectedly for configurations where services use Route with hostnames under the management cluster's apps domain. This caused upgrade failures on platforms with limited LoadBalancer IPs. The fix adds DNS label-based subdomain checking to determine if route hostnames are within the management cluster's default ingress domain. When hostnames are subdomains of the apps domain (e.g., oauth.apps.mgmt-cluster.example.com), they can be served by the management cluster's default router via wildcard DNS, so no dedicated HCP router LoadBalancer service is needed. This preserves the OCPBUGS-56914 fix for external DNS users while preventing unnecessary router service creation for apps-domain routes. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
b88f9a1 to
ee45d1b
Compare
|
@celebdor: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
I’ve posted my test results in the Jira issue. The router LB service was not created in 4.19, and the cluster was successfully upgraded from 4.18 to 4.19. No more LB svc is created any more. |
|
/verified by @heliubj18 |
|
@heliubj18: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/lgtm |
|
/label backport-risk-assessed |
|
@bertinatto: Can not set label backport-risk-assessed: Must be member in one of these teams: [openshift-patch-managers openshift-staff-engineers openshift-release-oversight openshift-sustaining-engineers] DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
What this PR does / why we need it:
Fix regression in 4.19 where HCP router LoadBalancer service was created unexpectedly for configurations using Route hostnames under the management cluster's apps domain.
After upgrading HCP clusters from 4.18 to 4.19, a
routerservice of type LoadBalancer is created unexpectedly. On platforms with limited LoadBalancer IPs (e.g., BareMetal with exhausted MetalLB IPAddressPool), the service stays in<pending>state and blocks the upgrade.The fix for OCPBUGS-56914 (PR #6780) changed the condition for creating the public router service from checking only if KAS uses Route to checking if ANY service uses Route with hostname. This caused the router service to be created for configurations that previously didn't need it.
This PR adds
IsPublicWithExternalDNS()function that checks if any service uses a Route with a hostname that is external to the management cluster's default ingress domain. When hostnames are subdomains of the apps domain (e.g.,oauth.apps.mgmt-cluster.example.com), the management cluster's default router can serve them via wildcard DNS, so no dedicated HCP router LoadBalancer service is needed.This preserves the OCPBUGS-56914 fix for external DNS users while preventing unnecessary router service creation for apps-domain routes.
Which issue(s) this PR fixes:
Fixes OCPBUGS-69866
Checklist
🤖 Generated with Claude Code