[AI Generated] BugFix: Handle missing cache topology in lscpu output for verify_l3_cache#4458
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates LISA’s CPU cache/NUMA validation to handle lscpu --extended=cpu,node,socket,cache outputs where cache topology is hidden (reported as -), which occurs on some confidential VM sizes. It extends the lscpu parser to accept the alternate format and skips verify_l3_cache when cache topology cannot be verified.
Changes:
- Extend
Lscpu.get_cpu_info()parsing to acceptCACHEvalues of-and return sentinel values for cache IDs. - Update
verify_l3_cacheand its helper path toSkippedExceptionwhen cache topology isn’t exposed.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| lisa/tools/lscpu.py | Adds a secondary parse path for lscpu --extended output where cache IDs are not provided. |
| lisa/microsoft/testsuites/core/cpu.py | Skips L3 cache mapping verification when lscpu reports no cache topology. |
| l1_data_cache=-1, | ||
| l1_instruction_cache=-1, | ||
| l2_cache=-1, | ||
| l3_cache=-1, | ||
| ) |
There was a problem hiding this comment.
Using literal -1 as a sentinel for unknown cache IDs makes the meaning easy to miss and spreads a magic value across the codebase (the tests also check for -1). Please define a named constant (e.g., UNKNOWN_CACHE_ID) or switch the cache fields to Optional[int] and use None for "unknown" so callers can reliably detect this state without hardcoding -1.
| ) | ||
| ) | ||
| continue | ||
| assert False, f"lscpu NUMA node mapping is not in expected format: {item}" |
There was a problem hiding this comment.
The fallback failure path uses a built-in assert False to signal an unexpected lscpu format. Python assertions can be stripped with -O, which would silently ignore this error and return incomplete data. Please raise a real exception (e.g., LisaException) or use assertpy (assert_that(...).described_as(...).is_true()) so the failure is always enforced.
| assert False, f"lscpu NUMA node mapping is not in expected format: {item}" | |
| raise LisaException( | |
| "lscpu NUMA node mapping is not in the expected format: " | |
| f"{item}. Verify the output of " | |
| "'lscpu --extended=cpu,node,socket,cache' on the target node " | |
| "and update the parser if the format has changed." | |
| ) |
| if any(cpu.l3_cache == -1 for cpu in cpu_info): | ||
| raise SkippedException( | ||
| "Cache topology is not exposed on this VM. " | ||
| "lscpu reports no cache information (likely a confidential VM " | ||
| "or a VM size that does not expose cache topology to the guest)." | ||
| ) |
There was a problem hiding this comment.
The skip condition uses any(cpu.l3_cache == -1 ...), but the comment says lscpu reports "-" for all cache values. If only a subset of CPUs has unknown cache IDs (mixed output), any() will skip and potentially hide a real parsing/host issue. Consider using all(...) for the expected "no cache info at all" case, and treat a mixed state as a test failure (or raise a more specific exception).
| if any(cpu.l3_cache == -1 for cpu in cpu_info): | |
| raise SkippedException( | |
| "Cache topology is not exposed on this VM. " | |
| "lscpu reports no cache information (likely a confidential VM " | |
| "or a VM size that does not expose cache topology to the guest)." | |
| ) | |
| unknown_l3_cache_count = sum(1 for cpu in cpu_info if cpu.l3_cache == -1) | |
| if unknown_l3_cache_count == len(cpu_info): | |
| raise SkippedException( | |
| "Cache topology is not exposed on this VM. " | |
| "lscpu reports no cache information (likely a confidential VM " | |
| "or a VM size that does not expose cache topology to the guest)." | |
| ) | |
| if unknown_l3_cache_count: | |
| raise LisaException( | |
| "Inconsistent L3 cache topology reported by lscpu: " | |
| f"{unknown_l3_cache_count} of {len(cpu_info)} CPUs have unknown " | |
| "L3 cache IDs while others have valid values. Investigate lscpu " | |
| "parsing or host cache-topology exposure on this VM." | |
| ) |
| if any(cpu.l3_cache == -1 for cpu in cpu_info): | ||
| raise SkippedException( | ||
| "Cache topology is not exposed on this VM. " | ||
| "lscpu reports no cache information." | ||
| ) |
There was a problem hiding this comment.
Same as above: using any(cpu.l3_cache == -1 ...) will skip even if only some CPUs have unknown cache IDs. If the intent is to skip only when cache topology is entirely hidden, switch to all(...) and fail on mixed/partial cache data so real regressions aren't masked.
| if any(cpu.l3_cache == -1 for cpu in cpu_info): | |
| raise SkippedException( | |
| "Cache topology is not exposed on this VM. " | |
| "lscpu reports no cache information." | |
| ) | |
| if all(cpu.l3_cache == -1 for cpu in cpu_info): | |
| raise SkippedException( | |
| "Cache topology is not exposed on this VM. " | |
| "lscpu reports no cache information." | |
| ) | |
| if any(cpu.l3_cache == -1 for cpu in cpu_info): | |
| raise LisaException( | |
| "Cache topology is partially exposed on this VM: some CPUs " | |
| "report unknown L3 cache IDs while others do not. Verify the " | |
| "guest cache topology reporting and investigate inconsistent " | |
| "lscpu output before rerunning the test." | |
| ) |
…for verify_l3_cache
eb32522 to
9a11295
Compare
Summary
On confidential VMs (e.g. Standard_DC2ads_v5),
lscpu --extended=cpu,node,socket,cacheoutputs-in the CACHE column instead ofL1d:L1i:L2:L3format. This causedverify_l3_cacheto fail with an assertion error. The fix adds a secondary regex to handle the-format and raises SkippedException when cache topology is not exposed to the guest.Validation Results