Skip to content

Use az CLI for non-batch AKS machine creates#1204

Open
karenychen wants to merge 4 commits into
Azure:mainfrom
karenychen:feat/k8s-machine-client-az-cli
Open

Use az CLI for non-batch AKS machine creates#1204
karenychen wants to merge 4 commits into
Azure:mainfrom
karenychen:feat/k8s-machine-client-az-cli

Conversation

@karenychen
Copy link
Copy Markdown
Contributor

Summary

  • Use Azure CLI for non-batch AKS machine nodepool and machine creates.
  • Keep BatchPutMachine on the direct ARM REST path and fail fast when calculated batch size exceeds 50.
  • Upload successful_machines as a count and preserve timeout propagation through client waits.

Validation

  • python3 -m pytest modules/python/tests/test_aks_machine_client.py modules/python/tests/test_machine_crud.py modules/python/tests/test_crud_main.py -q
  • python3 -m py_compile modules/python/clients/aks_machine_client.py modules/python/tests/test_aks_machine_client.py
  • PYTHONPATH=modules/python python3 -m pylint modules/python/tests/test_aks_machine_client.py modules/python/clients/aks_machine_client.py modules/python/crud/azure/machine_crud.py modules/python/crud/main.py
  • git diff --check

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates AKSMachineClient to use the Azure CLI for non-batch Machine API operations (agentpool creation and per-machine creates), while keeping the BatchPutMachine path on direct ARM REST calls with explicit client-side limits and improved telemetry.

Changes:

  • Add a CLI execution helper and switch create_machine_agentpool + non-batch machine creates to az (--no-wait) instead of direct ARM PUTs.
  • Keep BatchPutMachine on the REST path, and fail fast when the computed per-worker batch size exceeds 50 (and when a batch chunk exceeds 50).
  • Update unit tests to validate CLI command construction, successful machine count metadata, timeout propagation, and new batch-limit edge cases.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
modules/python/clients/aks_machine_client.py Introduces _run_az_cli_command, routes non-batch creates through Azure CLI, enforces BatchPutMachine size limits, and records successful_machines as a count.
modules/python/tests/test_aks_machine_client.py Updates tests for CLI-based behavior, successful machine count metadata, timeout forwarding, and adds coverage for new batch-limit validation paths.
Comments suppressed due to low confidence (1)

modules/python/clients/aks_machine_client.py:233

  • create_machine_agentpool now shells out to az aks nodepool add --no-wait and then immediately calls _wait_for_agentpool_provisioning(url, timeout). _wait_for_agentpool_provisioning raises immediately on any 4xx (including 404) (see its docstring/logic), so if ARM briefly returns 404 while the nodepool resource is being created, this path will fail fast even though the create is still in progress. Consider treating 404 as retryable for some window (or adding a retry_404/allow_not_found flag for the create flow) so the polling loop can wait for the resource to appear before enforcing terminal-state logic.
                self._run_az_cli_command(
                    command,
                    timeout=min(timeout, _PER_REQUEST_TIMEOUT_CAP),
                )
                if not self._wait_for_agentpool_provisioning(url, timeout):
                    raise RuntimeError(
                        f"agentpool {agentpool_name} did not reach Succeeded within {timeout}s"
                    )

Comment on lines +84 to +90
@@ -85,34 +85,34 @@ def tearDown(self):

@mock.patch.object(AKSMachineClient, "_wait_for_agentpool_provisioning",
return_value=True)
@mock.patch.object(AKSMachineClient, "make_request")
def test_create_machine_agentpool_success(self, mock_make_request, _mock_wait):
"""PUT 200 + Succeeded poll -> returns; metadata enriched."""
mock_resp = mock.MagicMock()
mock_resp.status_code = 200
mock_make_request.return_value = mock_resp
@mock.patch.object(AKSMachineClient, "_run_az_cli_command")
def test_create_machine_agentpool_success(self, mock_run_cli, _mock_wait):
"""CLI command accepted + Succeeded poll -> returns; metadata enriched."""
@karenychen
Copy link
Copy Markdown
Contributor Author

Code review

No issues found. Checked for bugs and AGENTS.md compliance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants