Skip to content

Conversation

@dwpaley
Copy link
Contributor

@dwpaley dwpaley commented Nov 23, 2025

Calling inspect.stack() appears harmful in a medium-sized (512 ranks) MPI job because every rank hits the filesystem to resolve module absolute paths. This was causing hangs of up to 5 minutes. Here we switch to sys._getframe to work around the problem.

dwpaley and others added 2 commits November 22, 2025 18:51
@ndevenish ndevenish requested a review from dagewa December 4, 2025 16:11
@dagewa
Copy link
Member

dagewa commented Dec 5, 2025

Thanks @dwpaley. Reading https://docs.python.org/3/library/sys.html#sys._getframe I see that sys._getframe "should be used for internal and specialized purposes only" and that it is not guaranteed to exist. Well, I guess this use does qualify as specialized, but perhaps we should catch if it doesn't exist and fall back to inspect.stack() in such cases.

I want to test this in a few production environments (dev build DIALS, release build DIALS, CCP4 DIALS, conda DIALS...) just to be sure we are getting what we expect. Might take a little while to get on to it, but it's on my list.

Copy link
Member

@dagewa dagewa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Success with DIALS (cmake) bootstrap build, release build, and DIALS installed from conda. Can't easily check CCP4 DIALS yet as it's on DIALS 3.21 and doesn't have the required code for the experiment history. But I think if there's any problems there we can make a special case for that later on.

So, I'm happy, but I think a fallback to inspect.stack() is a good idea if someone installs DIALS in some weird environment where sys._getframe does not exist. Plus needs a newsfragment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants