feat(azure): delegate-domain support — create DNS zone, refactor cert flow#2136
Conversation
… flow
Implements PrepareDomainDelegation for the Azure provider so each project
gets a dedicated public DNS zone in Azure DNS, returning the zone's name
servers for Fabric to delegate. Mirrors the GCP provider's approach.
Headline pieces:
- pkg/clouds/azure/dns: armdns wrapper with EnsureZoneExists (idempotent
get-then-create), matching GCP's EnsureDNSZoneExists.
- pkg/cli/client/byoc/azure/byoc.go: PrepareDomainDelegation creates the
zone in the per-project-stack resource group (ensuring the RG exists
via the existing idempotent CreateResourceGroup), returns the NS records,
and stores the zone name for parity with the GCP provider's
delegateDomainZone field. HasDelegatedSubdomain returns true so destroy
calls fabric.DeleteSubdomainZone. DOMAIN env var is forwarded to the CD
task via the cdCommand struct, mirroring the AWS provider's pattern.
- pkg/clouds/azure/aca/cert.go: moves the existing IssueCert flow from
pkg/cli/client/byoc/azure/cert.go into the cloud-SDK layer so both the
CLI's `defang cert generate` and the CD task (separate repo) can call
it directly without crossing the byoc → cloud-SDK boundary. The
*ByocAzure.IssueCert method is now a thin wrapper. Pure refactor for
the CLI's BYOD path; idempotency of every ARM step is preserved.
- pkg/cli/client/byoc/common.go: when DEFANG_PULUMI_DIR runs CD locally,
forward AZURE_STORAGE_{KEY,ACCOUNT,SAS_TOKEN} and AZURE_TENANT_ID from
the host shell into the subprocess so Pulumi's azblob backend can
authenticate directly. Avoids DefaultAzureCredential falling through
to AzureCLICredential, which times out under nix's slow `az` startup.
No-op for the ACA-job path, which uses managed identity in-cloud.
The companion pulumi-defang change adds the per-service CNAME + asuid TXT
records (Pulumi-managed) and the CD-side hook that calls aca.IssueCert to
register the customDomain + issue ManagedCertificate + bind SniEnabled,
completing the delegate-domain flow.
Verified end-to-end on Azure westus3 with the nextjs sample: fresh deploy
provisions zone → records → cert + binding in ~9 min; HTTPS serves a
valid DigiCert-issued cert for app.<delegate-domain> in ~0.6s response.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds Azure DNS zone management and a shared Container Apps certificate issuance flow; wires delegate-domain through ByocAzure deployment/CD, implements PrepareDomainDelegation to create/ensure delegate zones and return name servers, and refactors ByocAzure.IssueCert to delegate to the new ACA flow. ChangesAzure DNS Delegation and Certificate Provisioning
Sequence Diagram(s)sequenceDiagram
participant CLI
participant ByocAzure
participant DNS
participant ACA
participant AzureARM
CLI->>ByocAzure: PrepareDomainDelegation(domain)
ByocAzure->>DNS: EnsureZoneExists(domain)
DNS-->>ByocAzure: Name servers
ByocAzure->>CLI: return name servers
CLI->>ByocAzure: deploy with DelegateDomain
ByocAzure->>ACA: IssueCert(...) (uses DNS resolver)
ACA->>AzureARM: manage Container App & managed certificate lifecycle
ACA->>DNS: verify CNAME/TXT during issuance
ACA->>AzureARM: bind certificate and enable SNI
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.12.2)level=warning msg="The linter 'gomodguard' is deprecated (since v2.12.0) due to: new major version. Replaced by gomodguard_v2." Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
src/pkg/cli/client/byoc/azure/byoc.go (1)
49-51: 💤 Low valueClarify the purpose of
delegateDomainZonefield.This field is assigned at line 755 in
PrepareDomainDelegationbut never read elsewhere in the file. If it's intended for future cleanup logic (similar to tracking other provisioned resources), consider adding a TODO comment. If it's meant to mirror the GCP provider's state tracking, document that. Otherwise, the field appears unused.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/pkg/cli/client/byoc/azure/byoc.go` around lines 49 - 51, The delegateDomainZone field is assigned in PrepareDomainDelegation but never read; either remove delegateDomainZone if unused, or document intended purpose by adding a TODO comment and/or a short doc comment above the field explaining it mirrors GCP provider state or will be used for cleanup; if intended for future cleanup, ensure PrepareDomainDelegation and any cleanup method (e.g., the provider's teardown/Unprepare function) reference delegateDomainZone so it is actually used for resource tracking.src/pkg/clouds/azure/dns/dns.go (1)
80-91: ⚡ Quick winPrefer consistent zero-value return: empty slice over nil.
nameServers()returnsnilwhenzone.Propertiesis nil (line 82) but returns an empty slice[]string{}whenNameServersis empty (line 84'smakecreates a non-nil empty slice). This inconsistency can surprise callers who checkresult == nilvslen(result) == 0. Return an empty slice in both cases for predictable zero-value semantics.♻️ Proposed fix
func nameServers(zone armdns.Zone) []string { if zone.Properties == nil { - return nil + return []string{} } servers := make([]string, 0, len(zone.Properties.NameServers)) for _, ns := range zone.Properties.NameServers { if ns != nil { servers = append(servers, *ns) } } return servers }As per coding guidelines: "Treat zero values intentionally."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/pkg/clouds/azure/dns/dns.go` around lines 80 - 91, The nameServers function returns nil when zone.Properties is nil but returns a non-nil empty slice when NameServers is empty; make the return value consistent by returning an empty slice instead of nil whenever there are no name servers. Update nameServers (and the zone.Properties nil branch) so it always returns a zero-length []string (e.g., construct servers with make([]string,0) and return that when zone.Properties is nil or when NameServers yields no entries), referencing the nameServers function, zone.Properties, and zone.Properties.NameServers to locate the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/pkg/clouds/azure/aca/cert.go`:
- Around line 175-193: The patch is using the stale `current` value loaded
before the DNS wait, which can drop concurrent domain additions; in
addHostnameDisabled, re-fetch the latest ContainerApp resource (call the
ContainerAppsClient get method to retrieve the up-to-date resource) immediately
before building `domains` and calling `client.BeginUpdate`, then recheck
hasCustomDomain against that fresh resource, recompute
`existingCustomDomains(current)` from the refreshed object, and handle any GET
error before proceeding to BeginUpdate to avoid overwriting newer CustomDomains
entries.
- Around line 443-452: The managedCertName function currently trims env and
hostname which can cause collisions for long hostnames; update managedCertName
to append a short deterministic uniqueness suffix (e.g., 6-char hex or base36
derived from a hash of envName+hostname using sha256/sha1) after the sanitized
host segment so names remain deterministic and unique. Keep the existing
sanitize(), 15-char env and 30-char host truncation logic, then compute the
hash(sanitize(envName)+":"+sanitize(hostname)) and append a hyphen + first N
chars of the hex/base36 digest (N small, e.g., 6) to the returned string from
managedCertName to avoid collisions while preserving current length behavior.
Ensure the suffix is included in the final fmt.Sprintf("mc-%s-%s", ...) result
or appended to it (function managedCertName).
---
Nitpick comments:
In `@src/pkg/cli/client/byoc/azure/byoc.go`:
- Around line 49-51: The delegateDomainZone field is assigned in
PrepareDomainDelegation but never read; either remove delegateDomainZone if
unused, or document intended purpose by adding a TODO comment and/or a short doc
comment above the field explaining it mirrors GCP provider state or will be used
for cleanup; if intended for future cleanup, ensure PrepareDomainDelegation and
any cleanup method (e.g., the provider's teardown/Unprepare function) reference
delegateDomainZone so it is actually used for resource tracking.
In `@src/pkg/clouds/azure/dns/dns.go`:
- Around line 80-91: The nameServers function returns nil when zone.Properties
is nil but returns a non-nil empty slice when NameServers is empty; make the
return value consistent by returning an empty slice instead of nil whenever
there are no name servers. Update nameServers (and the zone.Properties nil
branch) so it always returns a zero-length []string (e.g., construct servers
with make([]string,0) and return that when zone.Properties is nil or when
NameServers yields no entries), referencing the nameServers function,
zone.Properties, and zone.Properties.NameServers to locate the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 1231358a-11fd-4ab2-ba5b-8374779b8755
⛔ Files ignored due to path filters (1)
src/go.sumis excluded by!**/*.sum
📒 Files selected for processing (10)
src/go.modsrc/pkg/cli/client/byoc/azure/byoc.gosrc/pkg/cli/client/byoc/azure/byoc_test.gosrc/pkg/cli/client/byoc/azure/cert.gosrc/pkg/cli/client/byoc/common.gosrc/pkg/cli/composeUp.gosrc/pkg/clouds/azure/aca/cert.gosrc/pkg/clouds/azure/aca/cert_test.gosrc/pkg/clouds/azure/dns/dns.gosrc/pkg/clouds/azure/dns/dns_test.go
Four changes responding to coderabbit's review on PR #2136: 1. addHostnameDisabled: refetch the ContainerApp right before the merge-patch instead of reusing the snapshot the caller fetched before the DNS wait. waitForBYODdns can block for up to 30 min, and ARM uses JSON Merge Patch that replaces the customDomains array wholesale — a stale slice would silently drop concurrent additions from another deploy. Drops the now- redundant *ContainerApp parameter; updates the IssueCert call site. 2. submitManagedCertTXT: stop swallowing every client.Get failure during the validation-token polling loop. Only the expected 404 (token not yet issued) should be retried. Permission, auth, and transient provider errors are now wrapped and returned instead of being dragged out to a generic 5-minute timeout — matches the project's error-handling rule. 3. managedCertName: append a short sha256 hash of the lowercased hostname so two long hostnames sharing the same first 21 sanitized characters no longer collide on the same managed-cert resource within an env (where the second issuance would overwrite the first). sha256 (not sha1) to satisfy gosec G505/G401; we only take 4 bytes, so output length is similar. Tests describe the contract by recomputing the suffix locally; one new test guards the collision case explicitly. 4. nameServers: return []string{} consistently instead of nil when zone.Properties is missing, so callers that check `== nil` and callers that check `len() == 0` see the same shape. Also expanded the comment on ByocAzure.delegateDomainZone explaining it mirrors the GCP provider's same-named field — set on success, currently informational only, kept as a hook for a future explicit-cleanup path. All affected packages still pass `go test -short` and `make lint`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Pushed
The separate flaky-test fix went out as #2137. |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/pkg/clouds/azure/aca/cert_test.go (1)
156-219:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winStale comment: suffix uses sha256, not "sha1".
Line 171 documents the appended suffix as "sha1", but
expectedSuffix(and per the PR,managedCertName) derives it fromsha256. The mismatched algorithm name is misleading, especially since sha256 was specifically chosen here for gosec.✏️ Proposed comment fix
// want is the prefix-plus-truncated-segments portion of the name; - // the per-hostname sha1 suffix is appended automatically. + // the per-hostname sha256 suffix is appended automatically. want string🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/pkg/clouds/azure/aca/cert_test.go` around lines 156 - 219, The test comment incorrectly says the appended per-hostname suffix is "sha1" while the code (expectedSuffix and managedCertName) uses SHA-256; update the comment to say "sha256" (or a neutral phrasing like "per-hostname hash using sha256") so it matches the implementation, and ensure any surrounding text referencing "sha1" is corrected to "sha256" to avoid confusion when reviewing managedCertName and expectedSuffix.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/pkg/clouds/azure/aca/cert_test.go`:
- Around line 234-244: The test doc comment incorrectly refers to a "sha1
suffix" even though the cert name suffix is derived from sha256; update the
comment on TestManagedCertName_DistinguishesLongHostnames to say
"sha256-derived" (or simply "hash-derived") to match the implementation in
managedCertName and avoid misleading references.
---
Outside diff comments:
In `@src/pkg/clouds/azure/aca/cert_test.go`:
- Around line 156-219: The test comment incorrectly says the appended
per-hostname suffix is "sha1" while the code (expectedSuffix and
managedCertName) uses SHA-256; update the comment to say "sha256" (or a neutral
phrasing like "per-hostname hash using sha256") so it matches the
implementation, and ensure any surrounding text referencing "sha1" is corrected
to "sha256" to avoid confusion when reviewing managedCertName and
expectedSuffix.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9a1d1bef-ff6f-4f8a-871c-e00cdae8c04d
📒 Files selected for processing (4)
src/pkg/cli/client/byoc/azure/byoc.gosrc/pkg/clouds/azure/aca/cert.gosrc/pkg/clouds/azure/aca/cert_test.gosrc/pkg/clouds/azure/dns/dns.go
🚧 Files skipped from review as they are similar to previous changes (2)
- src/pkg/cli/client/byoc/azure/byoc.go
- src/pkg/clouds/azure/aca/cert.go
jordanstephens
left a comment
There was a problem hiding this comment.
The PR description mentions a companion pulumi-defang PR, but it's not clear which PR that is referring to. Can we clarify?
That PR has dependency on this PR as the related cert gen functions used by both byod cert gen in cli and pulumi-defang for delegate domain is imported by pulumi-defang. |
What
Implements
PrepareDomainDelegationfor the Azure provider so each project gets a dedicated public DNS zone in Azure DNS, returning the zone's name servers for Fabric to delegate. Mirrors the GCP provider's approach.Headline pieces:
pkg/clouds/azure/dns— armdns wrapper withEnsureZoneExists(idempotent get-then-create), matching GCP'sEnsureDNSZoneExists.pkg/cli/client/byoc/azure/byoc.go—PrepareDomainDelegationcreates the zone in the per-project-stack resource group (ensuring the RG exists via the existing idempotentCreateResourceGroup), returns the NS records, and stores the zone name for parity with the GCP provider'sdelegateDomainZonefield.HasDelegatedSubdomainreturnstrueso destroy callsfabric.DeleteSubdomainZone.DOMAINenv var is forwarded to the CD task via thecdCommandstruct, mirroring the AWS provider's pattern (seepkg/cli/client/byoc/aws/byoc.go).pkg/clouds/azure/aca/cert.go— moves the existingIssueCertflow frompkg/cli/client/byoc/azure/cert.gointo the cloud-SDK layer so both the CLI'sdefang cert generateand the CD task (separate repo) can call it directly without crossing the byoc → cloud-SDK boundary. The*ByocAzure.IssueCertmethod is now a thin wrapper; pure refactor for the CLI's BYOD path. Each ARM step's idempotency is preserved, and there's an addedalreadyServingTLSshort-circuit so re-deploys skip the cert dance when nothing changed.pkg/cli/client/byoc/common.go— whenDEFANG_PULUMI_DIRruns CD locally, forwardAZURE_STORAGE_{KEY,ACCOUNT,SAS_TOKEN}andAZURE_TENANT_IDfrom the host shell into the subprocess so Pulumi's azblob backend can authenticate directly. AvoidsDefaultAzureCredentialfalling through toAzureCLICredential, which times out under nix's slowazstartup. No-op for the ACA-job path, which uses managed identity in-cloud.Why split this from the pulumi-defang change
The companion pulumi-defang PR adds the per-service CNAME + asuid TXT records (Pulumi-managed) and the CD-side hook that calls
aca.IssueCertto register the customDomain + issue ManagedCertificate + bind SniEnabled, completing the delegate-domain flow.The cd package in pulumi-defang
importsaca.IssueCertfrom this repo, so this PR has to merge first. Until the pulumi-defang side merges, theDOMAINenv var is set but unread, and the newacapackage has only its existing in-repo caller. Both are harmless no-ops, and the AWS provider already follows the same shape (DOMAINset, consumed by the corresponding CD).Verification
cd src && make test— passes including new tests forazuredns.EnsureZoneExists,PrepareDomainDelegationempty-domain / credential-error paths, andHasDelegatedSubdomain.cd src && make lint— 0 issues.azurewestus3, fresh deploy provisions zone → records → cert + binding in ~9 min; HTTPS serves a valid DigiCert-issued cert forapp.<delegate-domain>in ~0.6s response time.🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Chores