Skip to content

Conversation

@PrometheusPi
Copy link
Member

I encountered an issue on OLCF Frontier where we ran through the memtest loop, saw a defective node/GPU, but could not write out the error file. @psychocoderHPC and I suspect that the file system was not available or up-to-date on the defective node, thus preventing any useful error log.

To prevent this kind of bug in the future, this PR adds a check that writes to stderr if the directory to which to write the error log is not available.

@PrometheusPi PrometheusPi added this to the 0.9.0 / next stable milestone Dec 9, 2025
@PrometheusPi PrometheusPi added component: tools scripts, python libs and CMake CI:no-compile CI is skipping compile/runtime tests but runs PICMI tests labels Dec 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:no-compile CI is skipping compile/runtime tests but runs PICMI tests component: tools scripts, python libs and CMake

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants