Skip to content

[AI Generated] BugFix: Handle missing cache topology in lscpu output for verify_l3_cache#4458

Open
Gnandeep99 wants to merge 1 commit into
microsoft:mainfrom
Gnandeep99:bugfix/l3-cache-skip-unavailable-topology_010526_141205
Open

[AI Generated] BugFix: Handle missing cache topology in lscpu output for verify_l3_cache#4458
Gnandeep99 wants to merge 1 commit into
microsoft:mainfrom
Gnandeep99:bugfix/l3-cache-skip-unavailable-topology_010526_141205

Conversation

@Gnandeep99
Copy link
Copy Markdown
Collaborator

Summary

On confidential VMs (e.g. Standard_DC2ads_v5), lscpu --extended=cpu,node,socket,cache outputs - in the CACHE column instead of L1d:L1i:L2:L3 format. This caused verify_l3_cache to fail with an assertion error. The fix adds a secondary regex to handle the - format and raises SkippedException when cache topology is not exposed to the guest.

Validation Results

Image VM Size Result
Canonical ubuntu-24_04-lts server 24.04.202408210 Standard_D2ds_v5 PASSED
Canonical 0001-com-ubuntu-confidential-vm-jammy 22_04-lts-cvm 22.04.202604150 Standard_DC2ads_v5 SKIPPED (expected)

Copilot AI review requested due to automatic review settings May 1, 2026 21:53
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates LISA’s CPU cache/NUMA validation to handle lscpu --extended=cpu,node,socket,cache outputs where cache topology is hidden (reported as -), which occurs on some confidential VM sizes. It extends the lscpu parser to accept the alternate format and skips verify_l3_cache when cache topology cannot be verified.

Changes:

  • Extend Lscpu.get_cpu_info() parsing to accept CACHE values of - and return sentinel values for cache IDs.
  • Update verify_l3_cache and its helper path to SkippedException when cache topology isn’t exposed.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
lisa/tools/lscpu.py Adds a secondary parse path for lscpu --extended output where cache IDs are not provided.
lisa/microsoft/testsuites/core/cpu.py Skips L3 cache mapping verification when lscpu reports no cache topology.

Comment thread lisa/tools/lscpu.py
Comment on lines +313 to +317
l1_data_cache=-1,
l1_instruction_cache=-1,
l2_cache=-1,
l3_cache=-1,
)
Copy link

Copilot AI May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using literal -1 as a sentinel for unknown cache IDs makes the meaning easy to miss and spreads a magic value across the codebase (the tests also check for -1). Please define a named constant (e.g., UNKNOWN_CACHE_ID) or switch the cache fields to Optional[int] and use None for "unknown" so callers can reliably detect this state without hardcoding -1.

Copilot uses AI. Check for mistakes.
Comment thread lisa/tools/lscpu.py Outdated
)
)
continue
assert False, f"lscpu NUMA node mapping is not in expected format: {item}"
Copy link

Copilot AI May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fallback failure path uses a built-in assert False to signal an unexpected lscpu format. Python assertions can be stripped with -O, which would silently ignore this error and return incomplete data. Please raise a real exception (e.g., LisaException) or use assertpy (assert_that(...).described_as(...).is_true()) so the failure is always enforced.

Suggested change
assert False, f"lscpu NUMA node mapping is not in expected format: {item}"
raise LisaException(
"lscpu NUMA node mapping is not in the expected format: "
f"{item}. Verify the output of "
"'lscpu --extended=cpu,node,socket,cache' on the target node "
"and update the parser if the format has changed."
)

Copilot uses AI. Check for mistakes.
Comment on lines +127 to +132
if any(cpu.l3_cache == -1 for cpu in cpu_info):
raise SkippedException(
"Cache topology is not exposed on this VM. "
"lscpu reports no cache information (likely a confidential VM "
"or a VM size that does not expose cache topology to the guest)."
)
Copy link

Copilot AI May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The skip condition uses any(cpu.l3_cache == -1 ...), but the comment says lscpu reports "-" for all cache values. If only a subset of CPUs has unknown cache IDs (mixed output), any() will skip and potentially hide a real parsing/host issue. Consider using all(...) for the expected "no cache info at all" case, and treat a mixed state as a test failure (or raise a more specific exception).

Suggested change
if any(cpu.l3_cache == -1 for cpu in cpu_info):
raise SkippedException(
"Cache topology is not exposed on this VM. "
"lscpu reports no cache information (likely a confidential VM "
"or a VM size that does not expose cache topology to the guest)."
)
unknown_l3_cache_count = sum(1 for cpu in cpu_info if cpu.l3_cache == -1)
if unknown_l3_cache_count == len(cpu_info):
raise SkippedException(
"Cache topology is not exposed on this VM. "
"lscpu reports no cache information (likely a confidential VM "
"or a VM size that does not expose cache topology to the guest)."
)
if unknown_l3_cache_count:
raise LisaException(
"Inconsistent L3 cache topology reported by lscpu: "
f"{unknown_l3_cache_count} of {len(cpu_info)} CPUs have unknown "
"L3 cache IDs while others have valid values. Investigate lscpu "
"parsing or host cache-topology exposure on this VM."
)

Copilot uses AI. Check for mistakes.
Comment on lines +312 to +316
if any(cpu.l3_cache == -1 for cpu in cpu_info):
raise SkippedException(
"Cache topology is not exposed on this VM. "
"lscpu reports no cache information."
)
Copy link

Copilot AI May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above: using any(cpu.l3_cache == -1 ...) will skip even if only some CPUs have unknown cache IDs. If the intent is to skip only when cache topology is entirely hidden, switch to all(...) and fail on mixed/partial cache data so real regressions aren't masked.

Suggested change
if any(cpu.l3_cache == -1 for cpu in cpu_info):
raise SkippedException(
"Cache topology is not exposed on this VM. "
"lscpu reports no cache information."
)
if all(cpu.l3_cache == -1 for cpu in cpu_info):
raise SkippedException(
"Cache topology is not exposed on this VM. "
"lscpu reports no cache information."
)
if any(cpu.l3_cache == -1 for cpu in cpu_info):
raise LisaException(
"Cache topology is partially exposed on this VM: some CPUs "
"report unknown L3 cache IDs while others do not. Verify the "
"guest cache topology reporting and investigate inconsistent "
"lscpu output before rerunning the test."
)

Copilot uses AI. Check for mistakes.
@Gnandeep99 Gnandeep99 force-pushed the bugfix/l3-cache-skip-unavailable-topology_010526_141205 branch from eb32522 to 9a11295 Compare May 1, 2026 22:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants