Skip to content

physicaldrivegetter: tolerate predictive failure and unknown ssacli statuses#45

Merged
ezekiel-alexrod merged 1 commit into
mainfrom
improvement/predictive-failure-ssacli
May 27, 2026
Merged

physicaldrivegetter: tolerate predictive failure and unknown ssacli statuses#45
ezekiel-alexrod merged 1 commit into
mainfrom
improvement/predictive-failure-ssacli

Conversation

@ezekiel-alexrod
Copy link
Copy Markdown
Contributor

Summary

  • Map ssacli's Predictive Failure status to PDStatusUsed so a drive that is still online and serving its array no longer aborts the install.
  • Soft-fail any other unmodeled ssacli status (e.g. Rebuilding, future labels) to PDStatusUnknown instead of returning an error, and surface the raw label through PhysicalDrive.Reason.
  • Extract the status lookup into a package-level ssacliStatusMap + parseSSACLIStatus helper.
  • Add two fixture-based tests (predictive_failure_detail.txt, unknown_status_detail.txt) covering both branches.

Why

The previous parser returned invalid status: <value> for anything outside of OK / Failed / Offline. In practice this means a single SMART warning (Predictive Failure) — on a disk that is otherwise fully operational — was enough to break the whole physical-drive inventory and block the install.

The set of statuses ssacli can emit is not fully documented and evolves with firmware/agent versions, so the parser also needs to be forward-compatible: unknown labels should degrade gracefully (visible as Unknown with the raw Reason preserved) rather than take the inventory call down.

Test plan

  • go test ./pkg/implementation/physicaldrivegetter/... passes, including the two new cases in TestSSACLIPhysicalDriveStatus.
  • Linter (golangci-lint run) stays green (note the documented //nolint:gochecknoglobals on the new lookup map).
  • Manual check on a host with a Predictive Failure drive: inventory completes and the drive shows up as Used with Reason: "Predictive Failure".

End-to-end validation on real hardware

Built for linux/amd64 and run against a real HPE Smart Array controller that carries a drive in Predictive Failure. The full inventory completes (exit 0) and the affected drive is reported as Used with the raw status preserved in Reason — no error, no aborted install:

Controller slot=0 — P816i-a SR Gen10

ID Model Serial Size Status Reason
4I:6:1 MO000800JXBEV W2X0751Y 800.0 GiB Used OK
4I:6:2 MO000800JXBEV W2X0WTKY 800.0 GiB Used OK
1I:1:1 MB006000JWJRP 91Q0A046FDYF 6.0 TiB Used OK
1I:1:2 MB006000JWJRP 91C0A01CFDYF 6.0 TiB Used OK
1I:1:3 MB6000JVYZD ZADBBY88 6.0 TiB Used OK
1I:1:4 MB6000JVYZD ZADBC3RL 6.0 TiB Used OK
2I:2:1 MB6000JVYZD ZADBC9G0 6.0 TiB Used OK
2I:2:2 MB6000JVYZD ZADBCAF0 6.0 TiB Used OK
2I:2:3 MB6000JVYZD ZADB9HEN 6.0 TiB Used OK
2I:2:4 MB6000JVYZD ZADBC60Z 6.0 TiB Used OK
3I:3:1 MB6000JVYZD ZADBC2HK 6.0 TiB Used OK
3I:3:2 MB6000JVYZD ZADBBFGH 6.0 TiB Used Predictive Failure
3I:3:3 MB006000JWWQN WSE1F022 6.0 TiB Used OK
3I:3:4 MB6000JVYZD ZADB9VP5 6.0 TiB Used OK

14 physical drives — inventory complete, exit 0.

…tatuses

ssacli reports a small open set of "Status:" values, but the parser
previously aborted the whole physical-drive inventory as soon as it saw
anything outside of OK/Failed/Offline. In particular, a drive flagged
"Predictive Failure" - which is still online and serving its array -
caused the install to fail.

- Map "Predictive Failure" to PDStatusUsed so SMART warnings no longer
  block the install while the drive is still operational.
- Soft-fail unknown statuses (e.g. "Rebuilding", future labels) to
  PDStatusUnknown instead of returning an error, and surface the raw
  ssacli label via PhysicalDrive.Reason so callers can still react.
- Extract the status lookup into parseSSACLIStatus + a package-level
  map for clarity, and cover both new paths with fixture-based tests.

Refs: ARTESCA-17608

Signed-off-by: Alex Rodriguez <131964409+ezekiel-alexrod@users.noreply.github.com>
@ezekiel-alexrod ezekiel-alexrod merged commit 3424cac into main May 27, 2026
6 checks passed
@ezekiel-alexrod ezekiel-alexrod deleted the improvement/predictive-failure-ssacli branch May 27, 2026 15:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants