Skip to content

Conversation

@130s
Copy link
Member

@130s 130s commented Sep 18, 2021

Issue aimed at

Changes

  • Add computer_hw package (renamed pr2_computer_monitor that was copied from pr2_robot repo)
  • Added a .launch to allow downstream to start processes by batch.

Review items

Test

Dev test done on Ubuntu 16.04 host with nvidia GeForce GTX 1060
# roslaunch computer_hw monitor.launch                                                                                                                                                                                                                                                                      
... logging to /root/.ros/log/1b44b418-1846-11ec-b2b0-c400ad2d8cb0/roslaunch-rabbitdeer-3380.log
Checking log directory for disk usage. This may take awhile.
Press Ctrl-C to interrupt
Done checking log file disk usage. Usage is <1GB.

started roslaunch server http://rabbitdeer:38343/

SUMMARY
========

PARAMETERS
 * /rosdistro: kinetic
 * /rosversion: 1.12.13

NODES
  /
    diag_agg (diagnostic_aggregator/aggregator_node)
    libsensors_monitor (libsensors_monitor/libsensors_monitor)
    nvidia_temperature_monitor (computer_hw/nvidia_temp.py)

auto-starting new master
process[master]: started with pid [3390]
ROS_MASTER_URI=http://localhost:11311

setting /run_id to 1b44b418-1846-11ec-b2b0-c400ad2d8cb0
process[rosout-1]: started with pid [3403]
started core service [/rosout]
process[libsensors_monitor-2]: started with pid [3410]
[ INFO] [1631944994.889260052]: Got system hostname: rabbitdeer
[ INFO] [1631944994.896585316]: Found sensor coretemp-isa-0000 with features: temp1, temp2, temp3, temp4, temp5
[ INFO] [1631944994.896702034]: Found sensor acpitz-virtual-0 with features: temp1, temp2, temp3
[ INFO] [1631944994.896749535]: Found sensor pch_skylake-virtual-0 with features: temp1
process[nvidia_temperature_monitor-3]: started with pid [3421]
[INFO] [1631944995.775560]: card_out: 
==============NVSMI LOG==============

Timestamp                           : Sat Sep 18 06:03:15 2021
Driver Version                      : 440.64
CUDA Version                        : 10.2

Attached GPUs                       : 1
GPU 00000000:01:00.0
    Product Name                    : GeForce GTX 1060 6GB
    Product Brand                   : GeForce
    Display Mode                    : Enabled
    Display Active                  : Enabled
    Persistence Mode                : Enabled
    Accounting Mode                 : Disabled
    Accounting Mode Buffer Size     : 4000
    Driver Model
        Current                     : N/A
        Pending                     : N/A
    Serial Number                   : N/A
    GPU UUID                        : GPU-7f9b4a72-68fe-e2a9-8907-4590704d3431
    Minor Number                    : 0
    VBIOS Version                   : 86.06.45.00.60
    MultiGPU Board                  : No
    Board ID                        : 0x100
    GPU Part Number                 : N/A
    Inforom Version
        Image Version               : G001.0000.01.04
        OEM Object                  : 1.1
        ECC Object                  : N/A
        Power Management Object     : N/A
    GPU Operation Mode
        Current                     : N/A
        Pending                     : N/A
    GPU Virtualization Mode
        Virtualization Mode         : None
        Host VGPU Mode              : N/A
    IBMNPU
        Relaxed Ordering Mode       : N/A
    PCI
        Bus                         : 0x01
        Device                      : 0x00
        Domain                      : 0x0000
        Device Id                   : 0x1C0310DE
        Bus Id                      : 00000000:01:00.0
        Sub System Id               : 0x61633842
        GPU Link Info
            PCIe Generation
                Max                 : 3
                Current             : 1
            Link Width
                Max                 : 16x
                Current             : 16x
        Bridge Chip
            Type                    : N/A
            Firmware                : N/A
        Replays Since Reset         : 0
        Replay Number Rollovers     : 0
        Tx Throughput               : 0 KB/s
        Rx Throughput               : 0 KB/s
    Fan Speed                       : 5 %
    Performance State               : P8
    Clocks Throttle Reasons
        Idle                        : Active
        Applications Clocks Setting : Not Active
        SW Power Cap                : Not Active
        HW Slowdown                 : Not Active
            HW Thermal Slowdown     : Not Active
            HW Power Brake Slowdown : Not Active
        Sync Boost                  : Not Active
        SW Thermal Slowdown         : Not Active
        Display Clock Setting       : Not Active
    FB Memory Usage
        Total                       : 6077 MiB
        Used                        : 114 MiB
        Free                        : 5963 MiB
    BAR1 Memory Usage
       Total                       : 256 MiB                                                                                                                                                                                                                                                                        [62/1811]
        Used                        : 5 MiB
        Free                        : 251 MiB
    Compute Mode                    : Default
    Utilization
        Gpu                         : 0 %
        Memory                      : 2 %
        Encoder                     : 0 %
        Decoder                     : 0 %
    Encoder Stats
        Active Sessions             : 0
        Average FPS                 : 0
        Average Latency             : 0
    FBC Stats
        Active Sessions             : 0
        Average FPS                 : 0
        Average Latency             : 0
    Ecc Mode
        Current                     : N/A
        Pending                     : N/A
    ECC Errors
        Volatile
            Single Bit            
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : N/A
            Double Bit            
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : N/A
        Aggregate
            Single Bit            
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : N/A
            Double Bit            
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : N/A
    Retired Pages
        Single Bit ECC              : N/A
        Double Bit ECC              : N/A
        Pending Page Blacklist      : N/A
    Temperature
        GPU Current Temp            : 51 C
        GPU Shutdown Temp           : 102 C
        GPU Slowdown Temp           : 99 C
        GPU Max Operating Temp      : N/A
        Memory Current Temp         : N/A
        Memory Max Operating Temp   : N/A
    Power Readings
        Power Management            : Supported
        Power Draw                  : 5.91 W
        Power Limit                 : 120.00 W
        Default Power Limit         : 120.00 W
        Enforced Power Limit        : 120.00 W
        Min Power Limit             : 60.00 W
        Max Power Limit             : 140.00 W
   Clocks
        Graphics                    : 139 MHz
        SM                          : 139 MHz
        Memory                      : 405 MHz
        Video                       : 544 MHz
    Applications Clocks
        Graphics                    : N/A
        Memory                      : N/A
    Default Applications Clocks
        Graphics                    : N/A
        Memory                      : N/A
    Max Clocks
        Graphics                    : 2012 MHz
        SM                          : 2012 MHz
        Memory                      : 4004 MHz
        Video                       : 1708 MHz
    Max Customer Boost Clocks
        Graphics                    : N/A
    Clock Policy
        Auto Boost                  : N/A
        Auto Boost Default          : N/A
    Processes


gpu_stat: header: 
  seq: 0
  stamp: 
    secs: 0
    nsecs:         0
  frame_id: ''
product_name: "GeForce GTX 1060 6GB"
pci_device_id: ''
pci_location: ''
display: ''
driver_version: "440.64"
temperature: 51
fan_speed: 23.5619449019
gpu_usage: 0
memory_usage: 2

process[diag_agg-4]: started with pid [3435]
[ERROR] [1631944995.896812050]: No analyzers initialized in AnalyzerGroup /diag_agg/analyzers
[ERROR] [1631944995.896856468]: Analyzer group for diagnostic aggregator failed to initialize!
^C[diag_agg-4] killing on exit
[nvidia_temperature_monitor-3] killing on exit
[libsensors_monitor-2] killing on exit
[INFO] [1631944996.825916]: card_out: 
gpu_stat: header: 
  seq: 0
  stamp: 
    secs: 0
    nsecs:         0
  frame_id: ''
product_name: ''
pci_device_id: ''
pci_location: ''
display: ''
driver_version: ''
temperature: 0.0
fan_speed: 0.0
gpu_usage: 0.0
memory_usage: 0.0

:

Sample of Diagnostic GUI with GPU monitoring output.

@130s
Copy link
Member Author

130s commented Sep 19, 2021

Not yet sure why one of the CI jobs failed.

https://app.travis-ci.com/github/ros-drivers/linux_peripheral_interfaces/jobs/538168773#L296

Compiling './computer_hw/src/computer_hw/nvidia_temperature_monitor.py'...

***   File "./computer_hw/src/computer_hw/nvidia_temperature_monitor.py", line 53
    except Exception, e:
                    ^
SyntaxError: invalid syntax

:

RefactoringTool: Refactored ./computer_hw/src/computer_hw/nvidia_temperature_monitor.py
--- ./computer_hw/src/computer_hw/nvidia_temperature_monitor.py	(original)
+++ ./computer_hw/src/computer_hw/nvidia_temperature_monitor.py	(refactored)
@@ -50,7 +50,7 @@
             gpu_stat = parse_smi_output(card_out)
             stat = gpu_status_to_diag(gpu_stat)
             rospy.loginfo("card_out: {}\ngpu_stat: {}\n".format(card_out, gpu_stat))
-        except Exception, e:
+        except Exception as e:
             import traceback
             rospy.logerr('Unable to process nVidia GPU data')
             rospy.logerr(traceback.format_exc())
RefactoringTool: No changes to ./laptop_battery_monitor/scripts/laptop_battery.py
RefactoringTool: Files that were modified:
RefactoringTool: ./computer_hw/executables/hd_monitor.py
RefactoringTool: ./computer_hw/executables/ntp_monitor.py
RefactoringTool: ./computer_hw/executables/wifi_monitor.py
RefactoringTool: ./computer_hw/src/computer_hw/nvidia_temperature_monitor.py
RefactoringTool: ./laptop_battery_monitor/scripts/laptop_battery.py
RefactoringTool: There was 1 error:
RefactoringTool: Can't parse ./computer_hw/executables/cpu_monitor.py: ParseError: bad input: type=22, value='=', context=('', (820, 90))

130s added a commit to kinu-garage/linux_peripheral_interfaces that referenced this pull request Sep 19, 2021
@130s
Copy link
Member Author

130s commented Sep 19, 2021

Not familiar with these Py error yet but https://stackoverflow.com/questions/57475673 suggests the code is already Py3, and this CI job wants see the code is compatible with Py2?

@130s
Copy link
Member Author

130s commented Sep 19, 2021

I can see locally the same though.
$ 2to3 cpu_monitor.py      
RefactoringTool: Skipping optional fixer: buffer    
RefactoringTool: Skipping optional fixer: idioms                                                                       
RefactoringTool: Skipping optional fixer: set_literal                                                                  
RefactoringTool: Skipping optional fixer: ws_comma
RefactoringTool: Can't parse cpu_monitor.py: ParseError: bad input: type=22, value='=', context=('', (820, 90))
RefactoringTool: No files need to be modified.
RefactoringTool: There was 1 error:
RefactoringTool: Can't parse cpu_monitor.py: ParseError: bad input: type=22, value='=', context=('', (820, 90))

@130s 130s mentioned this pull request Sep 20, 2021
@130s 130s changed the title WIP: Add computer_hw package (copied and improved from pr2_computer_monitor) WIP: Add computer_hw package (copied, improved from pr2_computer_monitor) Jan 21, 2022
130s referenced this pull request in PR2/pr2_robot Feb 24, 2022
fixup for 7d74c2e

there is one use of `print(str, file=...)` after the transition.
@130s
Copy link
Member Author

130s commented Feb 24, 2022

Closing in favor of #21

@130s 130s closed this Feb 24, 2022
130s added a commit to kinu-garage/linux_peripheral_interfaces that referenced this pull request Feb 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant