Skip to content

Detect active deployments before provisioning#7251

Open
spboyer wants to merge 6 commits intomainfrom
fix/deployment-active-conflict
Open

Detect active deployments before provisioning#7251
spboyer wants to merge 6 commits intomainfrom
fix/deployment-active-conflict

Conversation

@spboyer
Copy link
Member

@spboyer spboyer commented Mar 23, 2026

Summary

Fixes #7248

Before starting a deployment, azd now checks for active deployments on the target scope. If another deployment is in progress, it warns the user and waits for it to complete — avoiding the DeploymentActive ARM error that wastes ~5 minutes of the user's time.

Telemetry Context

  • 199 DeploymentActive failures in March (~270/month projected)
  • Average wait before failure: 5.3 minutes (P90: 12.2 min)
  • 78% from provision, 19% from up

Changes

Pre-deployment active check (bicep_provider.go)

Added waitForActiveDeployments() between preflight validation and deployment submission:

  • Lists deployments filtered for active provisioning states
  • If found: warns with deployment names, polls at 30s intervals
  • Timeout: 30 minutes (matches typical long deployments)
  • Only ignores ErrDeploymentsNotFound (scope doesn't exist yet); other errors propagate

Active state classification (deployments.go)

IsActiveDeploymentState() classifies 11 provisioning states as active, including transitional states (Canceling, Deleting, DeletingResources, UpdatingDenyAssignments) that can still block new deployments.

Scope interface (scope.go)

Added ListActiveDeployments() to both ResourceGroupScope and SubscriptionScope.

Error suggestion (error_suggestions.yaml)

Added DeploymentActive rule with user-friendly message and ARM troubleshooting link.

Test Coverage (8 tests, 24 subtests)

Test Coverage
TestIsActiveDeploymentState 17 subtests covering all provisioning states
TestWaitForActiveDeployments_NoActive Happy path
TestWaitForActiveDeployments_InitialListError_NotFound RG doesn't exist yet
TestWaitForActiveDeployments_InitialListError_Other Auth/throttle errors propagate
TestWaitForActiveDeployments_ActiveThenClear Polling until clear
TestWaitForActiveDeployments_CancelledContext Context cancellation
TestWaitForActiveDeployments_PollError Error during polling
TestWaitForActiveDeployments_Timeout 30min timeout

Related

Copilot AI review requested due to automatic review settings March 23, 2026 15:17
@spboyer spboyer added the bug Something isn't working label Mar 23, 2026
@spboyer spboyer self-assigned this Mar 23, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a pre-deployment check that detects in-progress ARM deployments at the target scope and waits for them to complete, avoiding DeploymentActive failures during provisioning.

Changes:

  • Introduces waitForActiveDeployments() in the Bicep provisioning flow and polls until deployments clear or a timeout is reached.
  • Adds IsActiveDeploymentState() plus new tests to classify which provisioning states are considered “active”.
  • Extends infra.Scope with ListActiveDeployments() and adds a DeploymentActive error suggestion rule.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
cli/azd/resources/error_suggestions.yaml Adds a user-facing suggestion for DeploymentActive ARM errors.
cli/azd/pkg/infra/scope.go Extends scope interface + implements ListActiveDeployments() for RG and subscription scopes.
cli/azd/pkg/infra/provisioning/bicep/bicep_provider.go Adds wait loop before deployment submission, with polling/timeout defaults.
cli/azd/pkg/infra/provisioning/bicep/bicep_provider_test.go Updates mocked scope to satisfy the new Scope interface.
cli/azd/pkg/infra/provisioning/bicep/active_deployment_check_test.go Adds tests covering wait-loop behavior, errors, cancellation, and timeout.
cli/azd/pkg/azapi/deployments.go Adds IsActiveDeploymentState() helper.
cli/azd/pkg/azapi/deployment_state_test.go Adds unit tests for active/inactive state classification.

spboyer and others added 6 commits March 23, 2026 15:23
Before starting a Bicep deployment, check the target scope for
in-progress ARM deployments and wait for them to complete. This avoids
the DeploymentActive error that ARM returns after ~5 minutes when a
concurrent deployment is already running on the same resource group.

Changes:
- Add IsActiveDeploymentState() helper in azapi to classify provisioning
  states as active or terminal.
- Add ListActiveDeployments() to the infra.Scope interface and both
  ResourceGroupScope / SubscriptionScope implementations.
- Add waitForActiveDeployments() in the Bicep provider, called after
  preflight validation and before deployment submission. It polls until
  active deployments clear or a 30-minute timeout is reached.
- Add a DeploymentActive error suggestion rule to error_suggestions.yaml.
- Add unit tests for state classification, polling, timeout, error
  handling, and context cancellation.

Fixes #7248

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…per, refresh timeout names

- Fix 'range 200' compile error (not valid in all Go versions)
- Make DeploymentActive YAML rule scope-agnostic
- Extract filterActiveDeployments helper to deduplicate scope logic
- Refresh deployment names from latest poll on timeout message

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@spboyer spboyer force-pushed the fix/deployment-active-conflict branch from f456f95 to 8829c13 Compare March 23, 2026 22:24
@spboyer
Copy link
Member Author

spboyer commented Mar 23, 2026

Telemetry Context: DeploymentActive + Retry Behavior

This PR addresses DeploymentActive (199 errors/month). Additional context from the deep dive:

Retry behavior makes this especially valuable

Of machines that hit InvalidTemplateDeployment errors (which includes DeploymentActive in the chain):

  • 66% retry without changing anything — for DeploymentActive, this means they re-submit and hit the same active deployment again
  • Average 3.6 retries per machine before they either succeed or give up
  • The detect-and-wait pattern in this PR would break that retry loop immediately

Time savings

  • DeploymentActive users currently wait an average deployment duration before failing, then retry
  • With this PR: one wait period (with progress feedback) instead of N failed attempts × deployment time each

This is a clean win — the fix is architecturally simple (poll + wait) and eliminates a category of failure that can never be solved by retrying.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Handle DeploymentActive conflict -- detect and wait for in-progress deployments

2 participants