Skip to content

RFE-8920: Reject internal .svc S3 endpoints in MigStorage#1459

Open
asadawar wants to merge 5 commits into
migtools:masterfrom
asadawar:rfe-8920-reject-svc-endpoints
Open

RFE-8920: Reject internal .svc S3 endpoints in MigStorage#1459
asadawar wants to merge 5 commits into
migtools:masterfrom
asadawar:rfe-8920-reject-svc-endpoints

Conversation

@asadawar
Copy link
Copy Markdown

@asadawar asadawar commented Apr 16, 2026

Summary

Hi, I raised RFE-8920 after working a customer case where a cross-cluster MTC migration failed with misleading S3 errors. The root cause was that the replication repository was configured with an internal .svc S3 endpoint (from an OBC), which isn't reachable from the source cluster.

This PR adds early validation to catch this misconfiguration before the migration starts, instead of letting it fail at runtime with confusing errors.

What this PR does

  • In pkg/cloudprovider/aws.go: after the existing URL scheme validation, checks if the S3URL or PublicURL hostname ends with .svc. If so, marks the field as S3URL-InternalEndpoint or PublicURL-InternalEndpoint.
  • In pkg/controller/migstorage/validation.go: separates internal endpoint errors from generic field errors and sets a dedicated InternalEndpointUsed condition with a clear message: "The S3 endpoint uses an internal .svc hostname that is not accessible across clusters. Use an externally reachable endpoint (e.g. an Object Gateway route) for cross-cluster migrations."
  • Added unit tests covering internal .svc endpoints (rejected) and external endpoints (allowed).

Why

The customer was using ODF Ceph RGW and the OBC automatically generated an internal .svc hostname as the S3 endpoint. In a cross-cluster migration, the source cluster can't resolve this. The resulting errors were misleading — S3 authentication failures and registry CrashLoopBackOff — which led to a long troubleshooting cycle before identifying the actual problem.

Test plan

  • go test ./pkg/cloudprovider/ — all existing + new tests pass
  • Configure MigStorage with a .svc S3 endpoint and verify the condition is set
  • Configure MigStorage with an external route endpoint and verify it passes validation

See: https://issues.redhat.com/browse/RFE-8920

Summary by CodeRabbit

  • New Features

    • Detects and rejects internal .svc/.svc.cluster.local endpoints in cloud storage configs and provides clearer guidance to use cross-cluster/public endpoints.
  • User-facing behavior

    • Validation now surfaces separate, specific messages when internal endpoints are found versus other credential/config errors.
  • Tests

    • Added tests covering detection and reporting of internal endpoint validation.

When users configure a replication repository for cross-cluster migration
using an internal .svc hostname (e.g. from an ObjectBucketClaim), the
migration fails at runtime with misleading S3 authentication errors and
registry CrashLoopBackOff. The actual problem is that .svc hostnames are
not resolvable from external clusters.

This adds an early validation check that detects .svc hostnames in S3URL
and PublicURL fields and rejects them with a clear error message telling
the user to use an externally reachable endpoint instead.

See: https://issues.redhat.com/browse/RFE-8920
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 16, 2026

Warning

Rate limit exceeded

@asadawar has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 53 minutes and 35 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 53 minutes and 35 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d5bde306-c600-4f1b-8cfe-9288f35f729c

📥 Commits

Reviewing files that changed from the base of the PR and between 58371cd and fb0b98a.

📒 Files selected for processing (1)
  • pkg/cloudprovider/aws_test.go
📝 Walkthrough

Walkthrough

Adds detection for internal Kubernetes service DNS endpoints (hosts ending with .svc or .svc.cluster.local) in AWS backup storage S3URL/PublicURL validation and surfaces them via a new InternalEndpointUsed condition distinct from other invalid backup-storage field errors.

Changes

Cohort / File(s) Summary
AWS provider validation & tests
pkg/cloudprovider/aws.go, pkg/cloudprovider/aws_test.go
Adds hostname suffix checks for parsed S3URL and PublicURL (.svc, .svc.cluster.local) and returns S3URL-InternalEndpoint/PublicURL-InternalEndpoint validation fields when detected. Adds tests verifying detection for internal vs external endpoints.
Migration storage validation logic
pkg/controller/migstorage/validation.go
Introduces exported condition type InternalEndpointUsed. Partitions provider-returned invalid fields into internal-endpoint vs other fields; sets InternalEndpointUsed when internal endpoints are present and preserves InvalidBSFields for other invalid fields.

Sequence Diagram

sequenceDiagram
    actor User
    participant Controller as Migration Controller
    participant Provider as AWS Provider
    participant Validator as Validation Handler
    participant Status as Conditions Manager

    User->>Controller: Create/Update backup storage config
    activate Controller
    Controller->>Provider: Validate(secret)
    activate Provider
    Provider->>Provider: Parse S3URL/PublicURL<br/>Check scheme http/https and host suffix (.svc / .svc.cluster.local)
    Provider-->>Controller: Return validation fields (e.g., `S3URL-InternalEndpoint`, `PublicURL`)
    deactivate Provider

    Controller->>Validator: Process validation fields
    activate Validator
    Validator->>Validator: Partition fields into<br/>internal-endpoint vs other
    alt Internal endpoint fields present
        Validator->>Status: Set `InternalEndpointUsed` condition<br/>Items = internal-endpoint fields
    end
    alt Other invalid fields present
        Validator->>Status: Set `InvalidBSFields` condition<br/>Items = other invalid fields
    end
    deactivate Validator

    Status-->>Controller: Conditions updated
    Controller-->>User: Return validation result
    deactivate Controller
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I sniffed the URLs in the night,

Found hidden .svc out of sight,
I flagged them bold, said "Not for you,"
Split errors true from errors blue,
Hop! cross-cluster guidance comes into light.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: rejecting internal .svc S3 endpoints in MigStorage validation, which is the primary purpose of the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
pkg/controller/migstorage/validation.go (1)

150-159: Consider future-proofing the error message.

The message explicitly references "S3 endpoint" which is currently accurate since only the AWS provider emits -InternalEndpoint fields. If other providers (Azure, GCP) adopt this pattern in the future, you may want to generalize the message or derive it from the field names in internalEndpointFields.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/migstorage/validation.go` around lines 150 - 159, The error
message for the InternalEndpointUsed condition currently hardcodes "The S3
endpoint...", which ties the message to AWS; update the logic that sets the
migapi.Condition (used in storage.Status.SetCondition with Type
InternalEndpointUsed, Reason NotSupported, Category Critical and Items
internalEndpointFields) to use a provider-agnostic message (e.g., "The storage
endpoint uses an internal `.svc` hostname..." or construct the message from
internalEndpointFields names) so it reads correctly for non-AWS providers—modify
where internalEndpointFields is checked and the Condition.Message is set to
produce a generic or derived message.
pkg/cloudprovider/aws_test.go (1)

17-36: Add test case for fully-qualified internal hostname.

Consider adding a test case for .svc.cluster.local to ensure FQDN internal endpoints are also detected (once the production code is updated to handle this case):

{
    name:        "internal FQDN .svc.cluster.local endpoint",
    s3URL:       "https://rgw.openshift-storage.svc.cluster.local",
    expectS3Svc: true,
},
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/cloudprovider/aws_test.go` around lines 17 - 36, Add a table-driven test
case to the aws_test.go cases slice to cover fully-qualified internal hostnames
by adding an entry with name "internal FQDN .svc.cluster.local endpoint", s3URL
"https://rgw.openshift-storage.svc.cluster.local", and expectS3Svc true; place
it alongside the existing cases that use s3URL/publicURL fields so the test loop
that inspects s3URL and expectS3Svc will exercise FQDN .svc.cluster.local
handling (refer to the test cases slice and the s3URL, publicURL, expectS3Svc
fields in the file).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/cloudprovider/aws.go`:
- Around line 262-264: The PublicURL hostname check only looks for ".svc" and
misses FQDNs like ".svc.cluster.local"; update the conditional in the PublicURL
handling (where u.Hostname() is tested and "PublicURL-InternalEndpoint" is
appended) to also detect ".svc.cluster.local" (e.g., check
strings.HasSuffix(u.Hostname(), ".svc.cluster.local") in addition to
".svc")—apply the same check pattern used for S3URL to ensure fully qualified
cluster local hostnames are caught.
- Around line 254-256: The hostname check that currently uses
strings.HasSuffix(u.Hostname(), ".svc") misses fully-qualified service names
like "rgw.openshift-storage.svc.cluster.local"; update the conditional in
pkg/cloudprovider/aws.go where u.Hostname() is tested (the branch that appends
"S3URL-InternalEndpoint") to also detect a ".svc" label inside FQDNs (for
example by adding strings.Contains(u.Hostname(), ".svc.") OR by matching the
".svc" segment more robustly), so both short-form (".svc") and FQDNs
(".svc.<domain>") and custom cluster domains are correctly classified.

---

Nitpick comments:
In `@pkg/cloudprovider/aws_test.go`:
- Around line 17-36: Add a table-driven test case to the aws_test.go cases slice
to cover fully-qualified internal hostnames by adding an entry with name
"internal FQDN .svc.cluster.local endpoint", s3URL
"https://rgw.openshift-storage.svc.cluster.local", and expectS3Svc true; place
it alongside the existing cases that use s3URL/publicURL fields so the test loop
that inspects s3URL and expectS3Svc will exercise FQDN .svc.cluster.local
handling (refer to the test cases slice and the s3URL, publicURL, expectS3Svc
fields in the file).

In `@pkg/controller/migstorage/validation.go`:
- Around line 150-159: The error message for the InternalEndpointUsed condition
currently hardcodes "The S3 endpoint...", which ties the message to AWS; update
the logic that sets the migapi.Condition (used in storage.Status.SetCondition
with Type InternalEndpointUsed, Reason NotSupported, Category Critical and Items
internalEndpointFields) to use a provider-agnostic message (e.g., "The storage
endpoint uses an internal `.svc` hostname..." or construct the message from
internalEndpointFields names) so it reads correctly for non-AWS providers—modify
where internalEndpointFields is checked and the Condition.Message is set to
produce a generic or derived message.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2765a388-9e63-4d66-a839-ce36f4180bee

📥 Commits

Reviewing files that changed from the base of the PR and between a3ff4c5 and 36273b5.

📒 Files selected for processing (3)
  • pkg/cloudprovider/aws.go
  • pkg/cloudprovider/aws_test.go
  • pkg/controller/migstorage/validation.go

Comment thread pkg/cloudprovider/aws.go Outdated
Comment thread pkg/cloudprovider/aws.go Outdated
Abhijeet Sadawarte added 2 commits April 16, 2026 16:00
The previous check only caught short-form .svc hostnames. Users can
also use the FQDN form (e.g. rgw.openshift-storage.svc.cluster.local)
which was not being rejected.
Removed the Contains(".svc.") check and kept only HasSuffix(".svc")
as discussed with reviewers. Removed .svc.cluster.local test cases
accordingly.
@asadawar
Copy link
Copy Markdown
Author

Updated based on the Slack discussion with @pranavgaikwad. Simplified the check to use only HasSuffix(".svc") and dropped the Contains(".svc.") check. Removed the .svc.cluster.local test cases accordingly.

Abhijeet Sadawarte added 2 commits April 20, 2026 20:06
Cover both short form (.svc) and FQDN (.svc.cluster.local) using
HasSuffix for each. Added test cases for both forms.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant