Skip to content

feat: version-scoped scripts hotfix via staging#8412

Draft
Devinwong wants to merge 1 commit intomainfrom
devinwon/scripts-hotfix-version-scoping
Draft

feat: version-scoped scripts hotfix via staging#8412
Devinwong wants to merge 1 commit intomainfrom
devinwon/scripts-hotfix-version-scoping

Conversation

@Devinwong
Copy link
Copy Markdown
Collaborator

Summary

Adds version-scoped scripts hotfix support so that script hotfixes only apply to the targeted VHD version, not all nodes unconditionally.

Problem

The current scripts hotfix mechanism injects write_files entries at real destination paths (e.g. /opt/azure/containers/provision_installs.sh). ANC's applyNodeCustomData() writes them unconditionally, meaning a hotfix targeting VHD 202604.27.0 would also overwrite scripts on older VHDs where the logic may differ.

Solution

Staging approach: Instead of writing hotfix scripts directly to their real paths, they are written to a staging directory (/opt/azure/hotfix/scripts/) along with a manifest (manifest.json) that contains the target VHD version and staging→destination path mappings.

ANC then:

  1. Reads the manifest
  2. Compares targetVersion against its own VHD version (set via ldflags)
  3. Copies staged files to real paths only if the version base (YYYYMM.DD) matches
  4. Skips gracefully for non-matching versions

Changes

  • aks-node-controller/nodecustomdata.go: Added applyScriptHotfix(), isScriptHotfixTargeted(), copyHotfixFile() and manifest structs
  • aks-node-controller/app.go: Wired applyScriptHotfix() into Provision() after applyNodeCustomData()
  • hack/hotfix_generate.py: Updated inject_hotfix() to generate staging paths + manifest; added rewrite_block_to_staging(), read_target_version(), VARKEY_TO_DEST_PATH mapping
  • aks-node-controller/nodecustomdata_test.go: 7 new tests covering version match, mismatch, higher patch, no manifest, empty manifest, dev version, and isScriptHotfixTargeted table tests

Compatibility

  • No RP changes required — ANC knows its VHD version from ldflags
  • No manifest = no-op — nodes without hotfix entries are unaffected
  • Backward safe — older ANC versions ignore the staging directory
  • Forward safe — new ANC on VHDs without hotfix just logs "no manifest found"
  • Automatic cleanup — next weekly release from main has no hotfix markers

Copilot AI review requested due to automatic review settings April 27, 2026 23:01
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds version-scoped script hotfix support by staging hotfix files plus a manifest, and applying them only when the node’s VHD version base (YYYYMM.DD) matches the hotfix target.

Changes:

  • Generate staged hotfix write_files entries plus a manifest.json describing target VHD version + staging→destination mappings.
  • Apply staged hotfix files at provision time only when the target version matches the node’s version base.
  • Add unit tests for version match/mismatch, patch variations, missing/empty manifest, and dev version handling.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
hack/hotfix_generate.py Writes hotfix scripts to a staging directory and emits a version-scoped manifest used by ANC.
aks-node-controller/nodecustomdata.go Adds manifest parsing, version targeting logic, and staged-file copy into real destinations.
aks-node-controller/app.go Invokes script hotfix application during provisioning.
aks-node-controller/nodecustomdata_test.go Adds tests for version targeting and manifest edge cases.

Comment thread hack/hotfix_generate.py
Comment on lines +306 to +331
for line in block_lines:
stripped = line.strip()
if stripped.startswith('- path:'):
# Extract the path value (may be a Go template expression or literal)
path_value = stripped[len('- path:'):].strip()

# Find which dest path this corresponds to by matching template functions
matched_dest = None
for vk in varkeys:
if vk in VARKEY_TO_DEST_PATH:
dest = VARKEY_TO_DEST_PATH[vk]
# Check if this path line could resolve to this dest
# Template functions like {{GetCSEInstallScriptFilepath}} resolve to the dest
basename = dest.rsplit('/', 1)[-1]
staging = f"{HOTFIX_STAGING_DIR}/{basename}"
if dest not in [e[1] for e in manifest_entries]:
matched_dest = dest
manifest_entries.append((staging, dest))
break

if matched_dest:
basename = matched_dest.rsplit('/', 1)[-1]
staging_path = f"{HOTFIX_STAGING_DIR}/{basename}"
rewritten.append(f"- path: {staging_path}\n")
else:
rewritten.append(line)
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The block rewrite currently doesn’t use path_value to determine which destination a given - path: line corresponds to; it effectively assigns destinations based on varkey iteration order and “first not yet used” rather than actually matching the line’s template/function/literal path. This can rewrite the wrong - path: entries (or produce incorrect staging→destination mappings) whenever a write_files block contains multiple files or when varkeys includes multiple destinations. Consider explicitly mapping each - path: line to a destination by parsing path_value (e.g., match known template function names / literals to a specific varkey/dest), and only then rewrite to a deterministic staging filename (and emit the corresponding manifest entry).

Copilot uses AI. Check for mistakes.
Comment thread hack/hotfix_generate.py
Comment on lines +300 to +329
# Collect unique destination paths from the varkeys in this block
dest_paths = set()
for vk in varkeys:
if vk in VARKEY_TO_DEST_PATH:
dest_paths.add(VARKEY_TO_DEST_PATH[vk])

for line in block_lines:
stripped = line.strip()
if stripped.startswith('- path:'):
# Extract the path value (may be a Go template expression or literal)
path_value = stripped[len('- path:'):].strip()

# Find which dest path this corresponds to by matching template functions
matched_dest = None
for vk in varkeys:
if vk in VARKEY_TO_DEST_PATH:
dest = VARKEY_TO_DEST_PATH[vk]
# Check if this path line could resolve to this dest
# Template functions like {{GetCSEInstallScriptFilepath}} resolve to the dest
basename = dest.rsplit('/', 1)[-1]
staging = f"{HOTFIX_STAGING_DIR}/{basename}"
if dest not in [e[1] for e in manifest_entries]:
matched_dest = dest
manifest_entries.append((staging, dest))
break

if matched_dest:
basename = matched_dest.rsplit('/', 1)[-1]
staging_path = f"{HOTFIX_STAGING_DIR}/{basename}"
rewritten.append(f"- path: {staging_path}\n")
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dest_paths is collected but never used, which makes the function harder to follow and suggests an incomplete implementation. Either remove this block or use dest_paths to drive deterministic rewriting/manifest entry generation (which would also help avoid the current per-line varkey scanning).

Suggested change
# Collect unique destination paths from the varkeys in this block
dest_paths = set()
for vk in varkeys:
if vk in VARKEY_TO_DEST_PATH:
dest_paths.add(VARKEY_TO_DEST_PATH[vk])
for line in block_lines:
stripped = line.strip()
if stripped.startswith('- path:'):
# Extract the path value (may be a Go template expression or literal)
path_value = stripped[len('- path:'):].strip()
# Find which dest path this corresponds to by matching template functions
matched_dest = None
for vk in varkeys:
if vk in VARKEY_TO_DEST_PATH:
dest = VARKEY_TO_DEST_PATH[vk]
# Check if this path line could resolve to this dest
# Template functions like {{GetCSEInstallScriptFilepath}} resolve to the dest
basename = dest.rsplit('/', 1)[-1]
staging = f"{HOTFIX_STAGING_DIR}/{basename}"
if dest not in [e[1] for e in manifest_entries]:
matched_dest = dest
manifest_entries.append((staging, dest))
break
if matched_dest:
basename = matched_dest.rsplit('/', 1)[-1]
staging_path = f"{HOTFIX_STAGING_DIR}/{basename}"
rewritten.append(f"- path: {staging_path}\n")
# Collect unique destination paths from the varkeys in deterministic order.
# This drives both path rewriting and manifest generation.
dest_paths = []
seen_dest_paths = set()
for vk in varkeys:
if vk in VARKEY_TO_DEST_PATH:
dest_path = VARKEY_TO_DEST_PATH[vk]
if dest_path not in seen_dest_paths:
dest_paths.append(dest_path)
seen_dest_paths.add(dest_path)
dest_path_index = 0
for line in block_lines:
stripped = line.strip()
if stripped.startswith('- path:'):
if dest_path_index < len(dest_paths):
matched_dest = dest_paths[dest_path_index]
basename = matched_dest.rsplit('/', 1)[-1]
staging_path = f"{HOTFIX_STAGING_DIR}/{basename}"
manifest_entries.append((staging_path, matched_dest))
rewritten.append(f"- path: {staging_path}\n")
dest_path_index += 1

Copilot uses AI. Check for mistakes.
Comment on lines +177 to +182
for _, f := range manifest.Files {
if err := copyHotfixFile(f.Staging, f.Destination); err != nil {
return fmt.Errorf("apply script hotfix %s → %s: %w", f.Staging, f.Destination, err)
}
slog.Info("applied script hotfix file", "staging", f.Staging, "destination", f.Destination)
}
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Destination comes directly from the manifest (ultimately from cloud-init/nodecustomdata), and the current implementation will write to any path, including sensitive locations, if the manifest is malformed or compromised. Add destination validation before copying (mandatory): require absolute, filepath.Cleaned paths; reject traversal; and enforce an allowlist/prefix list of approved destinations (ideally derived from the same canonical list used to generate the manifest). This limits arbitrary file overwrite risk while preserving the hotfix mechanism.

Copilot uses AI. Check for mistakes.
Comment on lines +225 to +229
if _, err := tmp.Write(srcData); err != nil {
tmp.Close()
os.Remove(tmpPath)
return fmt.Errorf("writing temp file %s: %w", tmpPath, err)
}
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(*os.File).Write can return a short write without an error; in that case, this would silently produce a truncated destination script. Ensure the full buffer is written (e.g., loop until all bytes are written, or use an API that guarantees full writes) and consider tmp.Sync() before Rename if durability across power loss is a concern during provisioning.

Suggested change
if _, err := tmp.Write(srcData); err != nil {
tmp.Close()
os.Remove(tmpPath)
return fmt.Errorf("writing temp file %s: %w", tmpPath, err)
}
written, err := io.Copy(tmp, bytes.NewReader(srcData))
if err != nil {
tmp.Close()
os.Remove(tmpPath)
return fmt.Errorf("writing temp file %s: %w", tmpPath, err)
}
if written != int64(len(srcData)) {
tmp.Close()
os.Remove(tmpPath)
return fmt.Errorf("writing temp file %s: short write (%d/%d bytes)", tmpPath, written, len(srcData))
}
if err := tmp.Sync(); err != nil {
tmp.Close()
os.Remove(tmpPath)
return fmt.Errorf("syncing temp file %s: %w", tmpPath, err)
}

Copilot uses AI. Check for mistakes.
@Devinwong Devinwong changed the title feat(anc): version-scoped scripts hotfix via staging feat: version-scoped scripts hotfix via staging Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants