Add error percentage to cpu and ram monitor#548
Conversation
|
@ct2034 |
|
@sangteak601 are you planning to work on this? |
All comments are addressed now I believe. Can you take another look? |
There was a problem hiding this comment.
Pull request overview
Adds error_percentage thresholds to cpu_monitor and ram_monitor so they can publish ERROR status, and adds a use_average switch to cpu_monitor to compare the threshold against the average CPU usage (consistent with the RAM monitor) instead of per-core usage. Documentation and the CPU system test are updated accordingly.
Changes:
CpuTaskgains requirederror_percentageanduse_averageconstructor args; status logic now selects max-core or average CPU, then evaluates OK / WARN / ERROR.RamTaskgains a requirederror_percentageconstructor arg and emits ERROR when RAM average exceeds it.- README and the CPU system test updated; new ROS parameters (
error_percentagedefault 95,use_averagedefault False) declared in both monitors'main().
Reviewed changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| diagnostic_common_diagnostics/diagnostic_common_diagnostics/cpu_monitor.py | Adds ERROR threshold and use_average switch, removes constructor defaults, declares new parameters. |
| diagnostic_common_diagnostics/diagnostic_common_diagnostics/ram_monitor.py | Adds ERROR threshold to RAM monitoring and declares the new parameter. |
| diagnostic_common_diagnostics/test/systemtest/test_cpu_monitor.py | Updates CpuTask constructor calls to match new required args; updates warn message assertion. |
| diagnostic_common_diagnostics/README.md | Documents new use_average and error_percentage args for CPU/RAM monitors. |
| diagnostic_common_diagnostics/diagnostic_common_diagnostics/param_decl.yaml | Empty file; no content change visible. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| * Name of the node is "cpu_monitor_" + hostname. | ||
| * Uses the following args: | ||
| * use_average: If true, the average CPU usage over all cores will be used to determine the status. If false, the maximum CPU usage among all cores will be used. | ||
| * warning_percentage: If the CPU usage is > warning_percentage, a WARN status will be publised. |
| class CpuTask(DiagnosticTask): | ||
|
|
||
| def __init__(self, warning_percentage=90, window=1): | ||
| def __init__(self, warning_percentage, error_percentage, window, use_average): |
| DiagnosticTask.__init__(self, 'CPU Information') | ||
|
|
||
| self._warning_percentage = int(warning_percentage) | ||
| self._error_percentage = int(error_percentage) |
| @@ -96,7 +96,7 @@ def test_updater(self): | |||
| node = Node('cpu_monitor_test') | |||
| updater = Updater(node) | |||
| updater.setHardwareID('test_id') | |||
| updater.add(CpuTask()) | |||
| updater.add(CpuTask(warning_percentage=95, error_percentage=100, window=1, use_average=False)) | |||
| * use_average: If true, the average CPU usage over all cores will be used to determine the status. If false, the maximum CPU usage among all cores will be used. | ||
| * warning_percentage: If the CPU usage is > warning_percentage, a WARN status will be publised. | ||
| * error_percentage: If the CPU usage is > error_percentage, an ERROR status will be published. |
ct2034
left a comment
There was a problem hiding this comment.
I agree with all of the copilot comments
At the moment, it is not possible to configure
cpu_monitororram_monitorto publish ERROR status. It would be good to have ERROR state. It would allow us to take actions before program exhibits unexpected behaviour.Also,
cpu_monitorwas updated to compare threshold to average cpu usage rather than individual cpu usage. This is aligned with howram_monitorworks and makes more sense, as in most cases users are more interested in average usage than individual usage.