Skip to content

stop XStatsHelper in do_scap before the interfaces are stopped.#188

Open
ShyamB97 wants to merge 11 commits into
prep-release/fddaq-v5.6.0from
sbhuller/stop_xstats
Open

stop XStatsHelper in do_scap before the interfaces are stopped.#188
ShyamB97 wants to merge 11 commits into
prep-release/fddaq-v5.6.0from
sbhuller/stop_xstats

Conversation

@ShyamB97
Copy link
Copy Markdown
Contributor

@ShyamB97 ShyamB97 commented May 12, 2026

Description

do scrap is initiated, the inferfaces are stopped correctly in the DPDKReaderModule, but not the XstatsHelper. which is used to poll metrics from the interfaces.
generate_opmon_data is called, so the XstatsHelper will try to poll metrics from an interface which has already been stopped, so this is the cause of the log messages

2026-May-11 14:41:14,090 LOG [void dunedaq::dpdklibs::IfaceXstats::poll() at /tmp/root/spack-stage/spack-stage-dpdklibs-v2.3.6-ayiqjuvypkgnlo5466hs66v3jnqgizd6/spack-src/include/dpdklibs/XstatsHelper.hpp:77] Cannot get xstat values!
Invalid port_id=0
Invalid port_id=0
Invalid port_id=0

Fix simply sets the m_allocated flag in the XStatsHelper to false within do_scrap of the DPDKReaderModule, thus preventing metrics to be polled from the interfaces while they are being stopped.

Can best tested by taking a DAQ run and then calling scrap, and observing whether the readout application crashes. Running with detector hardware is required.

Type of change

  • Documentation (non-breaking change that adds or improves the documentation)
  • New feature or enhancement (non-breaking change which adds functionality)
  • Optimization (non-breaking change that improves code/performance)
  • Bug fix (non-breaking change which fixes an issue)
  • Breaking change (whatever its nature)

Testing checklist

  • Unit tests pass (e.g. dbt-build --unittest)
  • Minimal system quicktest passes (pytest -s minimal_system_quick_test.py)
  • Full set of integration tests pass (dunedaq_integtest_bundle.sh)
  • Python tests pass if applicable (e.g. python -m pytest)
  • Pre-commit hooks run successfully if applicable (e.g. pre-commit run --all-files)

Testing requires detector to check this works.
several runs taken with BDE showed the appropriate message when scrap is initiated, indicating the XStatsHelper has m_allocated set to false. None of the logs mentioned in the description occur and no segmentation faults occured over the various runs.

Further checks

  • Code is commented where needed, particularly in hard-to-understand areas
  • Code style is correct (dbt-build --lint, and/or see https://dune-daq-sw.readthedocs.io/en/latest/packages/styleguide/)
  • If applicable, new tests have been added or an issue has been opened to tackle that in the future.
    (Indicate issue here: # (issue))

Comment thread include/dpdklibs/XstatsHelper.hpp Outdated
Copy link
Copy Markdown
Member

@roland-sipos roland-sipos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Way too much mixing of internal logic with trying to satisfy a system wide FSM that our modules don't even have possibility to inspect...

We need a serious discussion first about how to implement this function and manage HW resources. Because the appfwk and the opmon api clearly does not have ability to disable monitorable objects' polling depending on system state. Maybe that's fine, but for sure we won't "patch"/"hack" it here, in dpdklibs as all user code is potentially affected.

@ShyamB97 ShyamB97 changed the base branch from develop to prep-release/fddaq-v5.6.0 May 14, 2026 16:54
@ShyamB97
Copy link
Copy Markdown
Contributor Author

I fully agree with the sentiment that this is a wider issue with managing HW resources, that there is very likely similar problems with other repos, and that there is a need for discussion with relevant experts.
Is it also fair to say the scrap procedure in dpdklibs also needs to be reviewed as well? because I feel like not having a way to reset or some cleanup of the XstatsHelper is also not ideal. In which case then I understand why this patch is not really sufficient, and a full review is needed.

@wesketchum also some feedback from you on how to proceed would be helpful for us I think. This can also be discussed at a more appropriate forum if we deem so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants