feat: version-scoped scripts hotfix via staging#8412
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds version-scoped script hotfix support by staging hotfix files plus a manifest, and applying them only when the node’s VHD version base (YYYYMM.DD) matches the hotfix target.
Changes:
- Generate staged hotfix
write_filesentries plus amanifest.jsondescribing target VHD version + staging→destination mappings. - Apply staged hotfix files at provision time only when the target version matches the node’s version base.
- Add unit tests for version match/mismatch, patch variations, missing/empty manifest, and dev version handling.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| hack/hotfix_generate.py | Writes hotfix scripts to a staging directory and emits a version-scoped manifest used by ANC. |
| aks-node-controller/nodecustomdata.go | Adds manifest parsing, version targeting logic, and staged-file copy into real destinations. |
| aks-node-controller/app.go | Invokes script hotfix application during provisioning. |
| aks-node-controller/nodecustomdata_test.go | Adds tests for version targeting and manifest edge cases. |
| for line in block_lines: | ||
| stripped = line.strip() | ||
| if stripped.startswith('- path:'): | ||
| # Extract the path value (may be a Go template expression or literal) | ||
| path_value = stripped[len('- path:'):].strip() | ||
|
|
||
| # Find which dest path this corresponds to by matching template functions | ||
| matched_dest = None | ||
| for vk in varkeys: | ||
| if vk in VARKEY_TO_DEST_PATH: | ||
| dest = VARKEY_TO_DEST_PATH[vk] | ||
| # Check if this path line could resolve to this dest | ||
| # Template functions like {{GetCSEInstallScriptFilepath}} resolve to the dest | ||
| basename = dest.rsplit('/', 1)[-1] | ||
| staging = f"{HOTFIX_STAGING_DIR}/{basename}" | ||
| if dest not in [e[1] for e in manifest_entries]: | ||
| matched_dest = dest | ||
| manifest_entries.append((staging, dest)) | ||
| break | ||
|
|
||
| if matched_dest: | ||
| basename = matched_dest.rsplit('/', 1)[-1] | ||
| staging_path = f"{HOTFIX_STAGING_DIR}/{basename}" | ||
| rewritten.append(f"- path: {staging_path}\n") | ||
| else: | ||
| rewritten.append(line) |
There was a problem hiding this comment.
The block rewrite currently doesn’t use path_value to determine which destination a given - path: line corresponds to; it effectively assigns destinations based on varkey iteration order and “first not yet used” rather than actually matching the line’s template/function/literal path. This can rewrite the wrong - path: entries (or produce incorrect staging→destination mappings) whenever a write_files block contains multiple files or when varkeys includes multiple destinations. Consider explicitly mapping each - path: line to a destination by parsing path_value (e.g., match known template function names / literals to a specific varkey/dest), and only then rewrite to a deterministic staging filename (and emit the corresponding manifest entry).
| # Collect unique destination paths from the varkeys in this block | ||
| dest_paths = set() | ||
| for vk in varkeys: | ||
| if vk in VARKEY_TO_DEST_PATH: | ||
| dest_paths.add(VARKEY_TO_DEST_PATH[vk]) | ||
|
|
||
| for line in block_lines: | ||
| stripped = line.strip() | ||
| if stripped.startswith('- path:'): | ||
| # Extract the path value (may be a Go template expression or literal) | ||
| path_value = stripped[len('- path:'):].strip() | ||
|
|
||
| # Find which dest path this corresponds to by matching template functions | ||
| matched_dest = None | ||
| for vk in varkeys: | ||
| if vk in VARKEY_TO_DEST_PATH: | ||
| dest = VARKEY_TO_DEST_PATH[vk] | ||
| # Check if this path line could resolve to this dest | ||
| # Template functions like {{GetCSEInstallScriptFilepath}} resolve to the dest | ||
| basename = dest.rsplit('/', 1)[-1] | ||
| staging = f"{HOTFIX_STAGING_DIR}/{basename}" | ||
| if dest not in [e[1] for e in manifest_entries]: | ||
| matched_dest = dest | ||
| manifest_entries.append((staging, dest)) | ||
| break | ||
|
|
||
| if matched_dest: | ||
| basename = matched_dest.rsplit('/', 1)[-1] | ||
| staging_path = f"{HOTFIX_STAGING_DIR}/{basename}" | ||
| rewritten.append(f"- path: {staging_path}\n") |
There was a problem hiding this comment.
dest_paths is collected but never used, which makes the function harder to follow and suggests an incomplete implementation. Either remove this block or use dest_paths to drive deterministic rewriting/manifest entry generation (which would also help avoid the current per-line varkey scanning).
| # Collect unique destination paths from the varkeys in this block | |
| dest_paths = set() | |
| for vk in varkeys: | |
| if vk in VARKEY_TO_DEST_PATH: | |
| dest_paths.add(VARKEY_TO_DEST_PATH[vk]) | |
| for line in block_lines: | |
| stripped = line.strip() | |
| if stripped.startswith('- path:'): | |
| # Extract the path value (may be a Go template expression or literal) | |
| path_value = stripped[len('- path:'):].strip() | |
| # Find which dest path this corresponds to by matching template functions | |
| matched_dest = None | |
| for vk in varkeys: | |
| if vk in VARKEY_TO_DEST_PATH: | |
| dest = VARKEY_TO_DEST_PATH[vk] | |
| # Check if this path line could resolve to this dest | |
| # Template functions like {{GetCSEInstallScriptFilepath}} resolve to the dest | |
| basename = dest.rsplit('/', 1)[-1] | |
| staging = f"{HOTFIX_STAGING_DIR}/{basename}" | |
| if dest not in [e[1] for e in manifest_entries]: | |
| matched_dest = dest | |
| manifest_entries.append((staging, dest)) | |
| break | |
| if matched_dest: | |
| basename = matched_dest.rsplit('/', 1)[-1] | |
| staging_path = f"{HOTFIX_STAGING_DIR}/{basename}" | |
| rewritten.append(f"- path: {staging_path}\n") | |
| # Collect unique destination paths from the varkeys in deterministic order. | |
| # This drives both path rewriting and manifest generation. | |
| dest_paths = [] | |
| seen_dest_paths = set() | |
| for vk in varkeys: | |
| if vk in VARKEY_TO_DEST_PATH: | |
| dest_path = VARKEY_TO_DEST_PATH[vk] | |
| if dest_path not in seen_dest_paths: | |
| dest_paths.append(dest_path) | |
| seen_dest_paths.add(dest_path) | |
| dest_path_index = 0 | |
| for line in block_lines: | |
| stripped = line.strip() | |
| if stripped.startswith('- path:'): | |
| if dest_path_index < len(dest_paths): | |
| matched_dest = dest_paths[dest_path_index] | |
| basename = matched_dest.rsplit('/', 1)[-1] | |
| staging_path = f"{HOTFIX_STAGING_DIR}/{basename}" | |
| manifest_entries.append((staging_path, matched_dest)) | |
| rewritten.append(f"- path: {staging_path}\n") | |
| dest_path_index += 1 |
| for _, f := range manifest.Files { | ||
| if err := copyHotfixFile(f.Staging, f.Destination); err != nil { | ||
| return fmt.Errorf("apply script hotfix %s → %s: %w", f.Staging, f.Destination, err) | ||
| } | ||
| slog.Info("applied script hotfix file", "staging", f.Staging, "destination", f.Destination) | ||
| } |
There was a problem hiding this comment.
Destination comes directly from the manifest (ultimately from cloud-init/nodecustomdata), and the current implementation will write to any path, including sensitive locations, if the manifest is malformed or compromised. Add destination validation before copying (mandatory): require absolute, filepath.Cleaned paths; reject traversal; and enforce an allowlist/prefix list of approved destinations (ideally derived from the same canonical list used to generate the manifest). This limits arbitrary file overwrite risk while preserving the hotfix mechanism.
| if _, err := tmp.Write(srcData); err != nil { | ||
| tmp.Close() | ||
| os.Remove(tmpPath) | ||
| return fmt.Errorf("writing temp file %s: %w", tmpPath, err) | ||
| } |
There was a problem hiding this comment.
(*os.File).Write can return a short write without an error; in that case, this would silently produce a truncated destination script. Ensure the full buffer is written (e.g., loop until all bytes are written, or use an API that guarantees full writes) and consider tmp.Sync() before Rename if durability across power loss is a concern during provisioning.
| if _, err := tmp.Write(srcData); err != nil { | |
| tmp.Close() | |
| os.Remove(tmpPath) | |
| return fmt.Errorf("writing temp file %s: %w", tmpPath, err) | |
| } | |
| written, err := io.Copy(tmp, bytes.NewReader(srcData)) | |
| if err != nil { | |
| tmp.Close() | |
| os.Remove(tmpPath) | |
| return fmt.Errorf("writing temp file %s: %w", tmpPath, err) | |
| } | |
| if written != int64(len(srcData)) { | |
| tmp.Close() | |
| os.Remove(tmpPath) | |
| return fmt.Errorf("writing temp file %s: short write (%d/%d bytes)", tmpPath, written, len(srcData)) | |
| } | |
| if err := tmp.Sync(); err != nil { | |
| tmp.Close() | |
| os.Remove(tmpPath) | |
| return fmt.Errorf("syncing temp file %s: %w", tmpPath, err) | |
| } |
Summary
Adds version-scoped scripts hotfix support so that script hotfixes only apply to the targeted VHD version, not all nodes unconditionally.
Problem
The current scripts hotfix mechanism injects
write_filesentries at real destination paths (e.g./opt/azure/containers/provision_installs.sh). ANC'sapplyNodeCustomData()writes them unconditionally, meaning a hotfix targeting VHD202604.27.0would also overwrite scripts on older VHDs where the logic may differ.Solution
Staging approach: Instead of writing hotfix scripts directly to their real paths, they are written to a staging directory (
/opt/azure/hotfix/scripts/) along with a manifest (manifest.json) that contains the target VHD version and staging→destination path mappings.ANC then:
targetVersionagainst its own VHD version (set via ldflags)Changes
aks-node-controller/nodecustomdata.go: AddedapplyScriptHotfix(),isScriptHotfixTargeted(),copyHotfixFile()and manifest structsaks-node-controller/app.go: WiredapplyScriptHotfix()intoProvision()afterapplyNodeCustomData()hack/hotfix_generate.py: Updatedinject_hotfix()to generate staging paths + manifest; addedrewrite_block_to_staging(),read_target_version(),VARKEY_TO_DEST_PATHmappingaks-node-controller/nodecustomdata_test.go: 7 new tests covering version match, mismatch, higher patch, no manifest, empty manifest, dev version, andisScriptHotfixTargetedtable testsCompatibility