RFE-8920: Reject internal .svc S3 endpoints in MigStorage#1459
RFE-8920: Reject internal .svc S3 endpoints in MigStorage#1459asadawar wants to merge 5 commits into
Conversation
When users configure a replication repository for cross-cluster migration using an internal .svc hostname (e.g. from an ObjectBucketClaim), the migration fails at runtime with misleading S3 authentication errors and registry CrashLoopBackOff. The actual problem is that .svc hostnames are not resolvable from external clusters. This adds an early validation check that detects .svc hostnames in S3URL and PublicURL fields and rejects them with a clear error message telling the user to use an externally reachable endpoint instead. See: https://issues.redhat.com/browse/RFE-8920
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 53 minutes and 35 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdds detection for internal Kubernetes service DNS endpoints (hosts ending with Changes
Sequence DiagramsequenceDiagram
actor User
participant Controller as Migration Controller
participant Provider as AWS Provider
participant Validator as Validation Handler
participant Status as Conditions Manager
User->>Controller: Create/Update backup storage config
activate Controller
Controller->>Provider: Validate(secret)
activate Provider
Provider->>Provider: Parse S3URL/PublicURL<br/>Check scheme http/https and host suffix (.svc / .svc.cluster.local)
Provider-->>Controller: Return validation fields (e.g., `S3URL-InternalEndpoint`, `PublicURL`)
deactivate Provider
Controller->>Validator: Process validation fields
activate Validator
Validator->>Validator: Partition fields into<br/>internal-endpoint vs other
alt Internal endpoint fields present
Validator->>Status: Set `InternalEndpointUsed` condition<br/>Items = internal-endpoint fields
end
alt Other invalid fields present
Validator->>Status: Set `InvalidBSFields` condition<br/>Items = other invalid fields
end
deactivate Validator
Status-->>Controller: Conditions updated
Controller-->>User: Return validation result
deactivate Controller
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
pkg/controller/migstorage/validation.go (1)
150-159: Consider future-proofing the error message.The message explicitly references "S3 endpoint" which is currently accurate since only the AWS provider emits
-InternalEndpointfields. If other providers (Azure, GCP) adopt this pattern in the future, you may want to generalize the message or derive it from the field names ininternalEndpointFields.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/controller/migstorage/validation.go` around lines 150 - 159, The error message for the InternalEndpointUsed condition currently hardcodes "The S3 endpoint...", which ties the message to AWS; update the logic that sets the migapi.Condition (used in storage.Status.SetCondition with Type InternalEndpointUsed, Reason NotSupported, Category Critical and Items internalEndpointFields) to use a provider-agnostic message (e.g., "The storage endpoint uses an internal `.svc` hostname..." or construct the message from internalEndpointFields names) so it reads correctly for non-AWS providers—modify where internalEndpointFields is checked and the Condition.Message is set to produce a generic or derived message.pkg/cloudprovider/aws_test.go (1)
17-36: Add test case for fully-qualified internal hostname.Consider adding a test case for
.svc.cluster.localto ensure FQDN internal endpoints are also detected (once the production code is updated to handle this case):{ name: "internal FQDN .svc.cluster.local endpoint", s3URL: "https://rgw.openshift-storage.svc.cluster.local", expectS3Svc: true, },🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/cloudprovider/aws_test.go` around lines 17 - 36, Add a table-driven test case to the aws_test.go cases slice to cover fully-qualified internal hostnames by adding an entry with name "internal FQDN .svc.cluster.local endpoint", s3URL "https://rgw.openshift-storage.svc.cluster.local", and expectS3Svc true; place it alongside the existing cases that use s3URL/publicURL fields so the test loop that inspects s3URL and expectS3Svc will exercise FQDN .svc.cluster.local handling (refer to the test cases slice and the s3URL, publicURL, expectS3Svc fields in the file).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/cloudprovider/aws.go`:
- Around line 262-264: The PublicURL hostname check only looks for ".svc" and
misses FQDNs like ".svc.cluster.local"; update the conditional in the PublicURL
handling (where u.Hostname() is tested and "PublicURL-InternalEndpoint" is
appended) to also detect ".svc.cluster.local" (e.g., check
strings.HasSuffix(u.Hostname(), ".svc.cluster.local") in addition to
".svc")—apply the same check pattern used for S3URL to ensure fully qualified
cluster local hostnames are caught.
- Around line 254-256: The hostname check that currently uses
strings.HasSuffix(u.Hostname(), ".svc") misses fully-qualified service names
like "rgw.openshift-storage.svc.cluster.local"; update the conditional in
pkg/cloudprovider/aws.go where u.Hostname() is tested (the branch that appends
"S3URL-InternalEndpoint") to also detect a ".svc" label inside FQDNs (for
example by adding strings.Contains(u.Hostname(), ".svc.") OR by matching the
".svc" segment more robustly), so both short-form (".svc") and FQDNs
(".svc.<domain>") and custom cluster domains are correctly classified.
---
Nitpick comments:
In `@pkg/cloudprovider/aws_test.go`:
- Around line 17-36: Add a table-driven test case to the aws_test.go cases slice
to cover fully-qualified internal hostnames by adding an entry with name
"internal FQDN .svc.cluster.local endpoint", s3URL
"https://rgw.openshift-storage.svc.cluster.local", and expectS3Svc true; place
it alongside the existing cases that use s3URL/publicURL fields so the test loop
that inspects s3URL and expectS3Svc will exercise FQDN .svc.cluster.local
handling (refer to the test cases slice and the s3URL, publicURL, expectS3Svc
fields in the file).
In `@pkg/controller/migstorage/validation.go`:
- Around line 150-159: The error message for the InternalEndpointUsed condition
currently hardcodes "The S3 endpoint...", which ties the message to AWS; update
the logic that sets the migapi.Condition (used in storage.Status.SetCondition
with Type InternalEndpointUsed, Reason NotSupported, Category Critical and Items
internalEndpointFields) to use a provider-agnostic message (e.g., "The storage
endpoint uses an internal `.svc` hostname..." or construct the message from
internalEndpointFields names) so it reads correctly for non-AWS providers—modify
where internalEndpointFields is checked and the Condition.Message is set to
produce a generic or derived message.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 2765a388-9e63-4d66-a839-ce36f4180bee
📒 Files selected for processing (3)
pkg/cloudprovider/aws.gopkg/cloudprovider/aws_test.gopkg/controller/migstorage/validation.go
The previous check only caught short-form .svc hostnames. Users can also use the FQDN form (e.g. rgw.openshift-storage.svc.cluster.local) which was not being rejected.
Removed the Contains(".svc.") check and kept only HasSuffix(".svc")
as discussed with reviewers. Removed .svc.cluster.local test cases
accordingly.
|
Updated based on the Slack discussion with @pranavgaikwad. Simplified the check to use only |
Cover both short form (.svc) and FQDN (.svc.cluster.local) using HasSuffix for each. Added test cases for both forms.
Summary
Hi, I raised RFE-8920 after working a customer case where a cross-cluster MTC migration failed with misleading S3 errors. The root cause was that the replication repository was configured with an internal
.svcS3 endpoint (from an OBC), which isn't reachable from the source cluster.This PR adds early validation to catch this misconfiguration before the migration starts, instead of letting it fail at runtime with confusing errors.
What this PR does
pkg/cloudprovider/aws.go: after the existing URL scheme validation, checks if theS3URLorPublicURLhostname ends with.svc. If so, marks the field asS3URL-InternalEndpointorPublicURL-InternalEndpoint.pkg/controller/migstorage/validation.go: separates internal endpoint errors from generic field errors and sets a dedicatedInternalEndpointUsedcondition with a clear message: "The S3 endpoint uses an internal .svc hostname that is not accessible across clusters. Use an externally reachable endpoint (e.g. an Object Gateway route) for cross-cluster migrations.".svcendpoints (rejected) and external endpoints (allowed).Why
The customer was using ODF Ceph RGW and the OBC automatically generated an internal
.svchostname as the S3 endpoint. In a cross-cluster migration, the source cluster can't resolve this. The resulting errors were misleading — S3 authentication failures and registry CrashLoopBackOff — which led to a long troubleshooting cycle before identifying the actual problem.Test plan
go test ./pkg/cloudprovider/— all existing + new tests pass.svcS3 endpoint and verify the condition is setSee: https://issues.redhat.com/browse/RFE-8920
Summary by CodeRabbit
New Features
.svc/.svc.cluster.localendpoints in cloud storage configs and provides clearer guidance to use cross-cluster/public endpoints.User-facing behavior
Tests