HDDS-14862. Log volume failures and database errors as errors#9950
HDDS-14862. Log volume failures and database errors as errors#9950ptlrs wants to merge 1 commit intoapache:masterfrom
Conversation
adoroszlai
left a comment
There was a problem hiding this comment.
Thanks @ptlrs for working on this.
| dbLoaded.set(false); | ||
| dbLoadFailure.set(false); | ||
| LOG.info("SchemaV3 db is stopped at {} for volume {}", containerDBPath, | ||
| LOG.warn("SchemaV3 db is stopped at {} for volume {}", containerDBPath, |
There was a problem hiding this comment.
closeDbStore() is executed in normal shutdown, too, so this shouldn't be a warning. For failure case, warning is logged in callers of failVolume.
| volumeHealthMetrics.decrementHealthyVolumes(); | ||
| volumeHealthMetrics.incrementFailedVolumes(); | ||
| LOG.info("Moving Volume : {} to failed Volumes", volumeRoot); | ||
| LOG.error("Moving Volume : {} to failed Volumes", volumeRoot); |
There was a problem hiding this comment.
This is not an error. Callers of failVolume log at higher level.
| Time.monotonicNowNanos() - start); | ||
| } catch (Exception e) { | ||
| LOG.warn("compact rocksdb error in {}", dbFilePath, e); | ||
| LOG.error("compact rocksdb error in {}", dbFilePath, e); |
There was a problem hiding this comment.
This could just be a transient IO error, which happens sometimes. Since we don't act on the failure I think the original warning level makes sense.
There was a problem hiding this comment.
I think anything resulting a failed volume check should be logged at the error level. There are two such cases in HddsVolume#check that we can elevate from warn to error.
What changes were proposed in this pull request?
Volume failure is currently logged at INFO level. It should be marked as an ERROR as a volume failing is an actual problem in the system and also searching for ERRORs in the log files should flag this.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-14862
How was this patch tested?
CI: https://github.com/ptlrs/ozone/actions/runs/23277693312