Skip to content

feat: harden local-dev and Azure deploy automation scripts with prereq, role and retry safeguards#1009

Open
Rafi-Microsoft wants to merge 21 commits into
dev-v4from
feature/local_automation_enhancements
Open

feat: harden local-dev and Azure deploy automation scripts with prereq, role and retry safeguards#1009
Rafi-Microsoft wants to merge 21 commits into
dev-v4from
feature/local_automation_enhancements

Conversation

@Rafi-Microsoft
Copy link
Copy Markdown
Contributor

Purpose

Harden the local-dev and deploy-to-Azure automation scripts (PowerShell + Bash) with prerequisite checks, role-permission validation, retry/fallback logic, and a per-service build results summary. Also adds the supporting documentation.

Scope of this PR

  • infra/scripts/deploy_to_azure.ps1 / .sh — production deploy automation
  • infra/scripts/setup_local_dev.ps1 / .sh — one-shot local-dev environment setup
  • docs/AutomatedLocalSetup.md, docs/DeployLocalChanges.md — usage guides

Key changes

  • Pre-flight prerequisite checks for Azure CLI, Docker, Python 3.12+, Node, npm, uv, Git (script-appropriate).
  • Azure role / permission pre-checks (non-fatal warnings, group-inheritance aware):
    • Deploy scripts: Contributor (resource mgmt) + UAA / RBAC Admin (role assignment).
    • Setup scripts: UAA / RBAC Admin only.
  • Role-definition existence check for newer roles (Azure AI User, Azure AI Developer) that may not exist in older subscriptions or sovereign clouds.
  • Aggregated role-assignment failure tracking with a post-run summary; scripts exit with code 2 (vs. hard-fail 1) when role assignments fail but the rest of setup succeeded, so CI can distinguish.
  • Retry & fallback logic:
    • uv sync → retried once with --refresh on failure.
    • npm install → retried once with --legacy-peer-deps on failure.
    • ACR login exit-code checked with actionable diagnostic hints.
    • Per-service build/push resilience — one service failing no longer aborts the others; results table shown in summary.
  • Two reverts (Revert "removed mount from dockerfiles", Revert "docker image time optimization v1") — dockerfile changes deferred to a separate PR; backed up locally via git tags dockerfile-backup/*.

Does this introduce a breaking change?

  • Yes
  • No

All new behavior is additive. Exit code 2 is new but only emitted when role assignments fail (previously these were silent warnings).

How to Test

git clone https://github.com/microsoft/Multi-Agent-Custom-Automation-Engine-Solution-Accelerator.git
cd Multi-Agent-Custom-Automation-Engine-Solution-Accelerator
git checkout feature/local_automation_enhancements

Deploy script (dry-run, safe — no Azure changes):

# Windows
.\infra\scripts\deploy_to_azure.ps1 -ResourceGroup <your-rg> -DryRun
# Linux / macOS / Git Bash
./infra/scripts/deploy_to_azure.sh --resource-group <your-rg> --dry-run

Setup local dev:

.\infra\scripts\setup_local_dev.ps1 -ResourceGroup <your-rg>
./infra/scripts/setup_local_dev.sh --resource-group <your-rg>

Validation performed

  • Syntax: bash -n on .sh files, PowerShell PSParser::Tokenize on .ps1 files — all pass.
  • End-to-end against a live resource group: both scripts completed with exit code 0; new step 1b / 2b: Checking Azure Roles & Permissions rendered correctly; build-results table rendered in deploy summary.

What to Check

  • Prerequisite & role-permission checks fire as Step 1b (deploy) / Step 2b (setup) and are clearly labeled.
  • Report-FailedRoleAssignments / report_failed_role_assignments only fires when at least one role assignment actually failed; exit code is 2 in that case, 0 otherwise.
  • uv sync and npm install retries surface the retry attempt in output before failing hard.
  • Deploy summary contains the new Build results table ([OK] / [FAIL] / dry-run per service).
  • No regression in existing flow on a fully-permissioned account.
  • Cosmetic-only noise (pm install, pm run build, stray pwsh path) is pre-existing PowerShell display behavior, not introduced here.

Other Information

  • Behavior parity maintained across PowerShell and Bash variants of each script.
  • Role-permission pre-checks use az role assignment list --include-inherited --include-groups and are intentionally non-fatal (group/conditional grants can't always be enumerated).
  • Dockerfile build-time optimizations are intentionally not included here — they will land in a separate PR for focused review.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

Abdul-Microsoft and others added 20 commits May 6, 2026 11:46
- setup_local_dev.sh/.ps1: Automates full local development setup
  - Prerequisite checks with detailed install guidance
  - Azure config fetch from Resource Group (container app or individual resources)
  - RBAC role assignment with pre-check (Cosmos DB, AI Foundry, Search, Storage)
  - Virtual environment setup for backend, MCP server, and frontend
  - VS Code settings and launch.json generation
  - Auto-fix for .venv lock issues (VS Code Python extension)

- deploy_to_azure.sh/.ps1: Deploys local code changes to Azure
  - Builds Docker images for selected services (backend, mcp, frontend)
  - Pushes to ACR with unique timestamp+git-sha tags
  - Updates Container Apps and App Service with new images
  - ACR discovery, creation, and AcrPull role assignment
  - Dry-run mode, build-only/deploy-only modes
  - Rollback commands printed after deployment

- .gitignore: Added local dev artifacts (.macae_*.pid, start_all_services.sh)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove local dev setup artifacts from .gitignore
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Fix Write-LogWarn crash from extra -ForegroundColor parameter (deploy_to_azure.ps1)
- Fix misleading git diff warning message in both deploy scripts
- Change -r to -g short flag for --resource-group to match docs (setup_local_dev.sh)
- Gate RBAC assignment behind --assign-rbac / -AssignRbac flag (both setup scripts)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
RBAC roles are essential for local dev and should always be assigned.
The script already skips roles that are already assigned, so running
unconditionally is safe and simplifies the user experience.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Moved all 4 scripts to scripts/ directory and updated all
references in docs/AutomatedLocalSetup.md and docs/DeployLocalChanges.md.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Scripts now compute REPO_ROOT as two levels up from their location
in infra/scripts/, so all src/ and .azure/ paths resolve correctly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds cross-platform (Bash + PowerShell) automation for local development setup and for deploying local code changes to an existing Azure deployment, with prerequisite checks, Azure role/permission validation, retry/fallback behaviors, and improved run summaries. This fits into the repo’s infra/scripts/ tooling and complements the existing manual setup/deployment docs by providing repeatable, guided workflows.

Changes:

  • Add setup_local_dev scripts to generate backend .env, assign required roles, install deps (uv/pip/npm), and optionally scaffold VS Code config.
  • Add deploy_to_azure scripts to build/push images to ACR, configure registry settings, update Container Apps/App Service, and print rollback commands.
  • Add documentation pages describing usage, flags, and expected flow for both workflows.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
infra/scripts/setup_local_dev.sh New Bash local-dev setup automation (prereqs, auth, config, RBAC, installs, VS Code scaffolding).
infra/scripts/setup_local_dev.ps1 New PowerShell local-dev setup automation mirroring Bash behavior.
infra/scripts/deploy_to_azure.sh New Bash deploy-to-Azure automation with retries, ACR resolution, per-service build results, rollback output.
infra/scripts/deploy_to_azure.ps1 New PowerShell deploy-to-Azure automation mirroring Bash behavior.
docs/AutomatedLocalSetup.md Documentation for running the local-dev setup scripts.
docs/DeployLocalChanges.md Documentation for running the deploy-to-Azure scripts.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread infra/scripts/setup_local_dev.sh Outdated
Comment thread infra/scripts/setup_local_dev.sh
Comment thread infra/scripts/setup_local_dev.sh Outdated
Comment thread infra/scripts/setup_local_dev.ps1
Comment thread infra/scripts/setup_local_dev.ps1
Comment thread infra/scripts/deploy_to_azure.sh Outdated
Comment thread infra/scripts/deploy_to_azure.ps1
Comment thread docs/DeployLocalChanges.md
Comment thread docs/AutomatedLocalSetup.md
Comment thread infra/scripts/deploy_to_azure.sh
Resolves 12 inline comments from Copilot reviewer on PR #1009:

setup_local_dev.sh:
* Add Bash 4+ requirement check at top (macOS ships 3.2 — `declare -A`
  is not available there). Fail with actionable Homebrew install hint.
* Remove silent fallback to Python <3.12 in interpreter detection; rely on
  check_prerequisites to fail loudly when 3.12+ is missing.
* Make generated .vscode/settings.json `python.defaultInterpreterPath`
  OS-aware (bin/python on Linux/macOS, Scripts/python.exe on Windows).

setup_local_dev.ps1:
* Add `#Requires -Version 7.0` plus runtime guard — `??` and
  `Out-File -Encoding utf8NoBOM` aren't supported on Windows PowerShell 5.1.
* Remember the detected Python invocation (e.g., `py -3.12`) and reuse it
  when creating the frontend venv, instead of unconditionally calling
  `python -m venv` (which fails when `python` isn't on PATH).

deploy_to_azure.sh:
* Fix az_retry comment to say "up to 4 attempts" to match the loop bound.
* Correct step header in update_azure_resources from "Step 7" to "Step 8"
  so log output matches docs/section structure.
* Add `_has_az_executable` helper using `type -P az` so the prereq check
  isn't satisfied by the `az()` wrapper function defined earlier in the
  script when the real Azure CLI isn't installed.

deploy_to_azure.ps1:
* In Configure-AcrOnResources, capture exit code of BOTH frontend
  `az webapp config` calls. Previously only the second was checked, so a
  failure to set DOCKER_REGISTRY_SERVER_URL could be silently masked by
  a successful acrUseManagedIdentityCreds update.

docs:
* DeployLocalChanges.md — add Step 1b (Azure roles & permissions check)
  to the "What It Does (in order)" list.
* AutomatedLocalSetup.md — add Step 2b (Azure roles & permissions check)
  to the "What It Does (in order)" list.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@Rafi-Microsoft
Copy link
Copy Markdown
Contributor Author

Addressed Copilot review feedback in commit d565c8a

Thanks @copilot for the review — all 12 inline comments have been addressed:

# File Line Issue Resolution
1 infra/scripts/setup_local_dev.sh 61 Python <3.12 fallback could let later uv sync --python 3.12 steps fail Removed silent fallback; check_prerequisites now fails loudly when no 3.12+ interpreter is found
2 infra/scripts/setup_local_dev.sh 561 macOS default Bash 3.2 doesn't support declare -A Added Bash 4+ requirement check at top with Homebrew install hint
3 infra/scripts/setup_local_dev.sh 986 .vscode/settings.json hard-coded Windows python.defaultInterpreterPath Now OS-aware: bin/python on Linux/macOS, Scripts/python.exe on Windows
4 infra/scripts/setup_local_dev.ps1 534 ?? operator not in Windows PowerShell 5.1 Added #Requires -Version 7.0 and runtime guard
5 infra/scripts/setup_local_dev.ps1 568 Out-File -Encoding utf8NoBOM not in PS 5.1 Same as #4 — PS 7+ now enforced
6 infra/scripts/setup_local_dev.ps1 86 Prereq accepts py -3.12 but frontend venv was created with bare python Detected invocation is now stored in $script:PythonInvocation and reused for venv creation
7 infra/scripts/deploy_to_azure.sh 81 az_retry comment said 3, loop did 4 Comment corrected to "up to 4 attempts"
8 infra/scripts/deploy_to_azure.sh 811 update_azure_resources logged "Step 7" but section is Step 8 Log message corrected to "Step 8"
9 infra/scripts/deploy_to_azure.ps1 656 Frontend az webapp config ran two commands but checked $LASTEXITCODE only once — partial failure could be masked Both exit codes captured independently and reported separately
10 docs/DeployLocalChanges.md 35 "What It Does (in order)" omitted Step 1b (Azure roles pre-check) Added Step 1b entry
11 docs/AutomatedLocalSetup.md 44 Same omission for Step 2b Added Step 2b entry
12 infra/scripts/deploy_to_azure.sh 26 command -v az would match the az() wrapper function even if the real CLI is missing Added _has_az_executable helper using type -P az (ignores functions); used in check_prerequisites

Validation: all 4 scripts pass syntax checks (bash -n for .sh, PowerShell AST parser for .ps1).
Nothing deferred — all comments were legitimate and applied as suggested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants