Fail smoke test on reboot errors#4472
Conversation
The smoke test currently only marks reboot failures as bad environment when the caught exception is TcpConnectionException. Other reboot failures from node.reboot(), such as a reboot timeout while waiting for the boot marker to advance, fall through to PassedException and make the test pass with a warning. That creates a false positive for cases where the guest never completes reboot. The daily Cloud Hypervisor v51 smoke run hit this path with 'timeout to wait reboot, the node may stuck on reboot command' but still exited successfully. Treat any exception from the reboot step as BadEnvironmentStateException after checking serial console panic, matching the existing comment that reboot connectivity failures should fail the test.
There was a problem hiding this comment.
Pull request overview
This PR fixes a false-positive behavior in the core provisioning smoke test by ensuring reboot failures don’t get downgraded into a partial pass. It aligns the runtime behavior with the intent that reboot issues indicate an unhealthy/bad environment.
Changes:
- Treat any exception thrown during the reboot step as
BadEnvironmentStateException(instead of onlyTcpConnectionException). - Preserve the original exception via chaining (
raise ... from e) to keep debugging context.
❌ AI Test Selection — FAILED78 test case(s) selected (view run) Marketplace image: canonical 0001-com-ubuntu-server-jammy 22_04-lts-gen2 latest
Test case details
|
|
@copilot apply changes based on the comments in this thread |
Agent-Logs-Url: https://github.com/microsoft/lisa/sessions/5145cf10-a674-4f81-a820-7ea560e96fa4 Co-authored-by: vyadavmsft <1424753+vyadavmsft@users.noreply.github.com>
Applied the requested review-thread updates in aa2275e: updated the inline comment to reflect that any reboot exception marks bad environment state, and made the exception message actionable with node name plus serial-console/reachability hints. |
❌ AI Test Selection — FAILED78 test case(s) selected (view run) Marketplace image: canonical 0001-com-ubuntu-server-jammy 22_04-lts-gen2 latest
Test case details
|
|
@LiliDeng pls check |
|
@LiliDeng @johnsongeorge-w can you pls check. |
|
@johnsongeorge-w @LiliDeng pls check |
Description
The smoke test currently only marks reboot failures as bad environment when the caught exception is TcpConnectionException. Other reboot failures from node.reboot(), such as a reboot timeout while waiting for the boot marker to advance, fall through to PassedException and make the test pass with a warning.
That creates a false positive for cases where the guest never completes reboot. The daily Cloud Hypervisor v51 smoke run hit this path with 'timeout to wait reboot, the node may stuck on reboot command' but still exited successfully.
Treat any exception from the reboot step as BadEnvironmentStateException after checking serial console panic, matching the existing comment that reboot connectivity failures should fail the test.
Related Issue
Type of Change
Checklist
Test Validation
Key Test Cases:
Smoke test
Impacted LISA Features:
Smoke test
Tested Azure Marketplace Images:
Test Results
2026-05-19 03:21:10.574[139919805806400][INFO] lisa.RootRunner ________________________________________
2026-05-19 03:21:10.574[139919805806400][INFO] lisa.RootRunner Provisioning.smoke_test: PASSED
2026-05-19 03:21:10.574[139919805806400][INFO] lisa.RootRunner CPU.verify_cpu_count: PASSED