Skip to content

physicaldrivegetter: tolerate predictive failure and unknown ssacli statuses#44

Closed
ezekiel-alexrod wants to merge 1 commit into
mainfrom
improvement/ARTESCA-17608-predictive-failure
Closed

physicaldrivegetter: tolerate predictive failure and unknown ssacli statuses#44
ezekiel-alexrod wants to merge 1 commit into
mainfrom
improvement/ARTESCA-17608-predictive-failure

Conversation

@ezekiel-alexrod
Copy link
Copy Markdown
Contributor

@ezekiel-alexrod ezekiel-alexrod commented May 27, 2026

Summary

  • Map ssacli's Predictive Failure status to PDStatusUsed so a drive that is still online and serving its array no longer aborts the install.
  • Soft-fail any other unmodeled ssacli status (e.g. Rebuilding, future labels) to PDStatusUnknown instead of returning an error, and surface the raw label through PhysicalDrive.Reason.
  • Extract the status lookup into a package-level ssacliStatusMap + parseSSACLIStatus helper.
  • Add two fixture-based tests (predictive_failure_detail.txt, unknown_status_detail.txt) covering both branches.

Why

The previous parser returned invalid status: <value> for anything outside of OK / Failed / Offline. In practice this means a single SMART warning (Predictive Failure) — on a disk that is otherwise fully operational — was enough to break the whole physical-drive inventory and block the install.

The set of statuses ssacli can emit is not fully documented and evolves with firmware/agent versions, so the parser also needs to be forward-compatible: unknown labels should degrade gracefully (visible as Unknown with the raw Reason preserved) rather than take the inventory call down.

Test plan

  • go test ./pkg/implementation/physicaldrivegetter/... passes, including the two new cases in TestSSACLIPhysicalDriveStatus.
  • Linter (golangci-lint run) stays green (note the documented //nolint:gochecknoglobals on the new lookup map).
  • Manual check on a host with a Predictive Failure drive: inventory completes and the drive shows up as Used with Reason: "Predictive Failure".

…tatuses

ssacli reports a small open set of "Status:" values, but the parser
previously aborted the whole physical-drive inventory as soon as it saw
anything outside of OK/Failed/Offline. In particular, a drive flagged
"Predictive Failure" - which is still online and serving its array -
caused the install to fail.

- Map "Predictive Failure" to PDStatusUsed so SMART warnings no longer
  block the install while the drive is still operational.
- Soft-fail unknown statuses (e.g. "Rebuilding", future labels) to
  PDStatusUnknown instead of returning an error, and surface the raw
  ssacli label via PhysicalDrive.Reason so callers can still react.
- Extract the status lookup into parseSSACLIStatus + a package-level
  map for clarity, and cover both new paths with fixture-based tests.

Refs: ARTESCA-17608

Signed-off-by: Alex Rodriguez <131964409+ezekiel-alexrod@users.noreply.github.com>
@ezekiel-alexrod ezekiel-alexrod requested a review from a team as a code owner May 27, 2026 13:47
@ezekiel-alexrod ezekiel-alexrod deleted the improvement/ARTESCA-17608-predictive-failure branch May 27, 2026 14:36
@ezekiel-alexrod ezekiel-alexrod changed the title ARTESCA-17608 - physicaldrivegetter: tolerate predictive failure and unknown ssacli statuses physicaldrivegetter: tolerate predictive failure and unknown ssacli statuses May 27, 2026
@ezekiel-alexrod
Copy link
Copy Markdown
Contributor Author

Superseded by #45 (branch renamed to drop the internal tracker reference from the name).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant