Skip to content

Comments

pmda/rds: Introduce new PMDA for RDS#2447

Open
Hannibal404 wants to merge 6 commits intoperformancecopilot:mainfrom
Hannibal404:rds_pmda
Open

pmda/rds: Introduce new PMDA for RDS#2447
Hannibal404 wants to merge 6 commits intoperformancecopilot:mainfrom
Hannibal404:rds_pmda

Conversation

@Hannibal404
Copy link
Contributor

This change adds a new PMDA (Performance Metrics Domain Agent) for Reliable Datagram Sockets (RDS). It exports key metrics including connection information, socket and connection statistics, and details of send, receive, and retransmit queues for performance analysis using Performance Co-Pilot (PCP).

This PMDA is intended to aid in diagnosing network-related issues on systems using RDS over Infiniband or TCP.

Replaces #2230

@natoscott
Copy link
Member

Install fails for me after building rpm packages with:

[pcpqa@fedora rds]$ sudo ./Install 
Traceback (most recent call last):
  File "/var/lib/pcp/pmdas/rds/pmdards.python", line 25, in <module>
    from modules.rds_ping import rds_ping_all_avlbl_dest
ModuleNotFoundError: No module named 'modules.rds_ping'

I expect it relates to the .python file extensions, and the more dynamic import mechanism used by pmdabcc might be more what you're after here.

Unrelated to this, the new QA test .out file contains several errors as well that shouldn't be there (relating to 'unknown metric name') - but, it fails with the Install for me so I've not been able to observe that second issue locally to advise further (its definitely wrong, I just don't know why).

@Hannibal404
Copy link
Contributor Author

Added simlinks for the modules files to fix the errors.

The QA output had unknown metrics errors due to IB specific metrics on a machine without infiniband. Updated.

@natoscott
Copy link
Member

@Hannibal404 thanks for the updates, I'm still seeing issues though. The test fails because rds Install fails similarly to previously...

[pcpqa@fedora rds]$ sudo ./Install 
Traceback (most recent call last):
  File "/var/lib/pcp/pmdas/rds/pmdards.python", line 50, in <module>
    from modules.rds_ping import rds_ping_all_avlbl_dest
ModuleNotFoundError: No module named 'modules.rds_ping'
Arrgh! failed to create /var/lib/pcp/pmdas/rds/domain.h.python from /var/lib/pcp/pmdas/rds/pmdards.python

I think you may need something more like this code from pmdabcc:

    def init_modules(self):
        """ Initialize modules """
        self.log("Initializing modules:")

        # For packaging, allow both .python and .py suffixed files
        cwd = os.getcwd()
        pmdadir = PCP.pmGetConfig('PCP_PMDASADM_DIR') + '/' + self.read_name()
        for root, _, filenames in os.walk(pmdadir):
            os.chdir(root)
            for filename in fnmatch.filter(filenames, '*.python'):
                if filename in ('pmdabcc.python', 'domain.h.python', 'pmns.python'):
                    continue
                pyf = filename[:-4]
                if not os.path.exists(pyf):
                    os.symlink(filename, pyf)
            os.chdir(pmdadir)
        os.chdir(cwd)

        import pmdautil # pylint: disable=import-outside-toplevel
        self.proc_helper = pmdautil.ProcMon(self.log, self.err)
        for module in self.modules:
            self.log(module)
            try:
                mod = importlib.import_module('modules.%s' % self.modules[module][MODULE])

@Hannibal404
Copy link
Contributor Author

that's strange, it was failing for me on a fedora machine, but after creating the symlinks it got resolved. I'll try using importlib.

@Hannibal404
Copy link
Contributor Author

Replaced the regular imports with importlib

@natoscott
Copy link
Member

Something is still wrong, this is what I see:

[pcpqa@fedora rds]$ sudo ./Install 
Traceback (most recent call last):
  File "/var/lib/pcp/pmdas/rds/pmdards.python", line 43, in <module>
    mod_rds_ping = importlib.import_module('modules.rds_ping')
  File "/usr/lib64/python3.14/importlib/__init__.py", line 88, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1398, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1371, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1335, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'modules.rds_ping'

I realize now there's a simpler example you can use - see the netcheck PMDA. The .py/.python aspect seems to be a red herring as it doesn't have to bother with that.

Can you do a ./Makepkgs build, install the new RPMs, and then in qa "./check -g pmda.rds" before resending - thanks!

@Hannibal404 Hannibal404 force-pushed the rds_pmda branch 2 times, most recently from 2a0f9be to cba6d07 Compare February 23, 2026 12:56
@Hannibal404
Copy link
Contributor Author

Hannibal404 commented Feb 23, 2026

The installation now works for me even without the symlink creation both with and without importlib. It doesn't seem to be an issue with how modules are imported since even netcheck has similar imports:

from modules.pcpnetcheck import PCPNetcheckModuleParams, DGW, DNS, NTP

I do not see a mention of modules.pcpnetcheck anywhere in netcheck.conf either.
I see that the install file for rds does not make a mention of the domain number, which I have updated as well.
Tried running the check script and it worked as expected.

PS: Made some updates to the test output

@natoscott
Copy link
Member

@Hannibal404 looks like there's some qa/group conflicts with main branch - could you take a look & I'll take this for another spin tomorrow? Thanks!

This commit adds a new PMDA (Performance Metrics Domain Agent) for
Reliable Datagram Sockets (RDS). It exports key metrics including
connection information, socket and connection statistics, and details
of send, receive, and retransmit queues for performance analysis using
Performance Co-Pilot (PCP).

This PMDA is intended to aid in diagnosing network-related issues
on systems using RDS over Infiniband or TCP.

Signed-off-by: Mohith Kumar Thummaluru <mohith.k.kumar.thummaluru@oracle.com>
Signed-off-by: Mohith Kumar Thummaluru <mohith.k.kumar.thummaluru@oracle.com>
Add manpage for rds pmda and address some linting issues

Signed-off-by: Pradyumn Rahar <pradyumn.rahar@oracle.com>
Signed-off-by: Pradyumn Rahar <pradyumn.rahar@oracle.com>
Signed-off-by: Pradyumn Rahar <pradyumn.rahar@oracle.com>
@Hannibal404
Copy link
Contributor Author

rebased and resolved conflicts

@natoscott
Copy link
Member

@Hannibal404 running Install in /var/lib/pcp/pmdas/rds still fails ...

[pcpqa@fedora ~]$ pwd
/var/lib/pcp/testsuite
[pcpqa@fedora ~]$ cd ../pmdas/rds
[pcpqa@fedora rds]$ sudo ./Install 
Traceback (most recent call last):
  File "/var/lib/pcp/pmdas/rds/pmdards.python", line 51, in <module>
    mod_rds_ping = importlib.import_module('modules.rds_ping')
  File "/usr/lib64/python3.14/importlib/__init__.py", line 88, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1398, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1371, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1335, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'modules.rds_ping'
Arrgh! failed to create /var/lib/pcp/pmdas/rds/domain.h.python from /var/lib/pcp/pmdas/rds/pmdards.python
[pcpqa@fedora rds]$ cd ../netcheck/
[pcpqa@fedora netcheck]$ sudo ./Install 
[Tue Feb 24 18:56:47] pmdanetcheck(981066) Info: Initializing, currently in 'notready' state.
[Tue Feb 24 18:56:47] pmdanetcheck(981066) Info: Reading configuration.
[Tue Feb 24 18:56:47] pmdanetcheck(981066) Info: Enabled modules:
[Tue Feb 24 18:56:47] pmdanetcheck(981066) Info: ['ping']
[Tue Feb 24 18:56:47] pmdanetcheck(981066) Info: Configured hosts:
[Tue Feb 24 18:56:47] pmdanetcheck(981066) Info: ['DGW', 'DNS']
[Tue Feb 24 18:56:47] pmdanetcheck(981066) Info: Determined default gateway: ['192.168.64.1'].
[Tue Feb 24 18:56:47] pmdanetcheck(981066) Info: Determined nameservers: ['127.0.0.53'].
[Tue Feb 24 18:56:47] pmdanetcheck(981066) Info: Configured and determined hosts:
[Tue Feb 24 18:56:47] pmdanetcheck(981066) Info: ['192.168.64.1', '127.0.0.53']
[Tue Feb 24 18:56:47] pmdanetcheck(981066) Info: Configured background check: True.
[Tue Feb 24 18:56:47] pmdanetcheck(981066) Info: Configured parallel setting: True.
[Tue Feb 24 18:56:47] pmdanetcheck(981066) Info: Configured check interval: 60s.
[Tue Feb 24 18:56:47] pmdanetcheck(981066) Info: Configured align interval: True.
[Tue Feb 24 18:56:47] pmdanetcheck(981066) Info: Reading module setup configuration:
[Tue Feb 24 18:56:47] pmdanetcheck(981066) Info: ping
[Tue Feb 24 18:56:47] pmdanetcheck(981066) Info: Module setup configurations read.
[Tue Feb 24 18:56:47] pmdanetcheck(981066) Info: Initializing modules:
[Tue Feb 24 18:56:47] pmdanetcheck(981066) Info: ping, cluster ID: 1804
[Tue Feb 24 18:56:47] pmdanetcheck(981066) Info: ping: ['192.168.64.1', '127.0.0.53']
[Tue Feb 24 18:56:47] pmdanetcheck(981066) Info: ping: Module parameters: command: ping, cmdargs: , count: 1, timeout: 1.0.
[Tue Feb 24 18:56:47] pmdanetcheck(981066) Info: ping: Initialized.
[Tue Feb 24 18:56:47] pmdanetcheck(981066) Info: Modules initialized.
[Tue Feb 24 18:56:47] pmdanetcheck(981066) Info: Registering metrics:
[Tue Feb 24 18:56:47] pmdanetcheck(981066) Info: ping
[Tue Feb 24 18:56:47] pmdanetcheck(981066) Info: Metrics registered.
[Tue Feb 24 18:56:47] pmdanetcheck(981071) Info: Initializing, currently in 'notready' state.
[Tue Feb 24 18:56:47] pmdanetcheck(981071) Info: Reading configuration.
[Tue Feb 24 18:56:47] pmdanetcheck(981071) Info: Enabled modules:
[Tue Feb 24 18:56:47] pmdanetcheck(981071) Info: ['ping']
[Tue Feb 24 18:56:47] pmdanetcheck(981071) Info: Configured hosts:
[Tue Feb 24 18:56:47] pmdanetcheck(981071) Info: ['DGW', 'DNS']
[Tue Feb 24 18:56:47] pmdanetcheck(981071) Info: Determined default gateway: ['192.168.64.1'].
[Tue Feb 24 18:56:47] pmdanetcheck(981071) Info: Determined nameservers: ['127.0.0.53'].
[Tue Feb 24 18:56:47] pmdanetcheck(981071) Info: Configured and determined hosts:
[Tue Feb 24 18:56:47] pmdanetcheck(981071) Info: ['192.168.64.1', '127.0.0.53']
[Tue Feb 24 18:56:47] pmdanetcheck(981071) Info: Configured background check: True.
[Tue Feb 24 18:56:47] pmdanetcheck(981071) Info: Configured parallel setting: True.
[Tue Feb 24 18:56:47] pmdanetcheck(981071) Info: Configured check interval: 60s.
[Tue Feb 24 18:56:47] pmdanetcheck(981071) Info: Configured align interval: True.
[Tue Feb 24 18:56:47] pmdanetcheck(981071) Info: Reading module setup configuration:
[Tue Feb 24 18:56:47] pmdanetcheck(981071) Info: ping
[Tue Feb 24 18:56:47] pmdanetcheck(981071) Info: Module setup configurations read.
[Tue Feb 24 18:56:47] pmdanetcheck(981071) Info: Initializing modules:
[Tue Feb 24 18:56:47] pmdanetcheck(981071) Info: ping, cluster ID: 1804
[Tue Feb 24 18:56:47] pmdanetcheck(981071) Info: ping: ['192.168.64.1', '127.0.0.53']
[Tue Feb 24 18:56:47] pmdanetcheck(981071) Info: ping: Module parameters: command: ping, cmdargs: , count: 1, timeout: 1.0.
[Tue Feb 24 18:56:47] pmdanetcheck(981071) Info: ping: Initialized.
[Tue Feb 24 18:56:47] pmdanetcheck(981071) Info: Modules initialized.
[Tue Feb 24 18:56:47] pmdanetcheck(981071) Info: Registering metrics:
[Tue Feb 24 18:56:47] pmdanetcheck(981071) Info: ping
[Tue Feb 24 18:56:47] pmdanetcheck(981071) Info: Metrics registered.
Updating the Performance Metrics Name Space (PMNS) ...
Terminate PMDA if already installed ...
Updating the PMCD control file, and notifying PMCD ...
Check netcheck metrics have appeared ... 1 metrics and 2 values
[pcpqa@fedora netcheck]$ 

@Hannibal404
Copy link
Contributor Author

the only remaining difference between netcheck and rds that I can see is the pyprep file, which just creates the symlinks, could

however on my end now even without symlinks it's not an issue

@natoscott
Copy link
Member

| however on my end [...]

How are you running this? (are you using Makepkgs and installing the packages?). What Linux distribution are you using there?

@Hannibal404
Copy link
Contributor Author

yes I am installing after building with Makepkgs.

$ uname -a
Linux prahar-pcp-fedora 6.17.1-300.fc43.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Oct  6 15:37:21 UTC 2025 x86_64 GNU/Linux

@natoscott
Copy link
Member

OK, excellent. Can you paste the output of the Install script after a fresh RPM install so we can see where it starts to differ to what I pasted above? Thanks.

@Hannibal404
Copy link
Contributor Author

after a fresh installation:

[root@prahar-pcp-fedora pcp]# cd /var/lib/pcp/pmdas/rds/
[root@prahar-pcp-fedora rds]# ./Install
[Tue Feb 24 10:20:53] pmdards(1400110) Info: Note: running as user "root"
[Tue Feb 24 10:20:53] pmdards(1400110) Info: Registered all metrics!
[Tue Feb 24 10:20:53] pmdards(1400114) Info: Note: running as user "root"
[Tue Feb 24 10:20:53] pmdards(1400114) Info: Registered all metrics!
Updating the Performance Metrics Name Space (PMNS) ...
Terminate PMDA if already installed ...
Updating the PMCD control file, and notifying PMCD ...
Check rds metrics have appeared ... 123 warnings, 123 metrics and 0 values

@natoscott
Copy link
Member

If I change my setup to perform the "pyprep" (same as netcheck), I get the same result as you. I guess at some point in the past you may have run that script in the "rds" directory? Either way, looks like it is needed after all - could you add it in here? Then I think we're good to go.

@Hannibal404
Copy link
Contributor Author

sure I will add it

Add pyprep file to resolve issue with imports

Signed-off-by: Pradyumn Rahar <pradyumn.rahar@oracle.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants