-
Notifications
You must be signed in to change notification settings - Fork 259
refactor(pathfinder): replace spawned child runner with subprocess entrypoint #1777
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
cpcloud
merged 3 commits into
NVIDIA:main
from
cpcloud:issue-1771-refactor-spawned-process-runner
Mar 17, 2026
Merged
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| # SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| # SPDX-License-Identifier: Apache-2.0 |
83 changes: 83 additions & 0 deletions
83
cuda_pathfinder/cuda/pathfinder/_testing/load_nvidia_dynamic_lib_subprocess.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,83 @@ | ||
| #!/usr/bin/env python | ||
| # SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| from __future__ import annotations | ||
|
|
||
| import json | ||
| import os | ||
| import sys | ||
| import traceback | ||
| from collections.abc import Sequence | ||
|
|
||
| DYNAMIC_LIB_NOT_FOUND_MARKER = "CHILD_LOAD_NVIDIA_DYNAMIC_LIB_HELPER_DYNAMIC_LIB_NOT_FOUND_ERROR:" | ||
|
|
||
|
|
||
| def _validate_abs_path(abs_path: str) -> None: | ||
| assert abs_path, f"empty path: {abs_path=!r}" | ||
| assert os.path.isabs(abs_path), f"not absolute: {abs_path=!r}" | ||
| assert os.path.isfile(abs_path), f"not a file: {abs_path=!r}" | ||
|
|
||
|
|
||
| def _load_nvidia_dynamic_lib_for_test(libname: str) -> str: | ||
| # Keep imports inside the subprocess body so startup stays focused on the | ||
| # code under test rather than the parent test module. | ||
| from cuda.pathfinder import load_nvidia_dynamic_lib | ||
| from cuda.pathfinder._dynamic_libs.load_dl_common import LoadedDL | ||
| from cuda.pathfinder._dynamic_libs.load_nvidia_dynamic_lib import _load_lib_no_cache | ||
| from cuda.pathfinder._dynamic_libs.supported_nvidia_libs import ( | ||
| SUPPORTED_LINUX_SONAMES, | ||
| SUPPORTED_WINDOWS_DLLS, | ||
| ) | ||
| from cuda.pathfinder._utils.platform_aware import IS_WINDOWS | ||
|
|
||
| def require_abs_path(loaded_dl: LoadedDL) -> str: | ||
| abs_path = loaded_dl.abs_path | ||
| if not isinstance(abs_path, str): | ||
| raise RuntimeError(f"loaded dynamic library is missing abs_path: {loaded_dl!r}") | ||
| _validate_abs_path(abs_path) | ||
| return abs_path | ||
|
|
||
| loaded_dl_fresh: LoadedDL = load_nvidia_dynamic_lib(libname) | ||
| if loaded_dl_fresh.was_already_loaded_from_elsewhere: | ||
| raise RuntimeError("loaded_dl_fresh.was_already_loaded_from_elsewhere") | ||
|
|
||
| fresh_abs_path = require_abs_path(loaded_dl_fresh) | ||
| assert loaded_dl_fresh.found_via is not None | ||
|
|
||
| loaded_dl_from_cache: LoadedDL = load_nvidia_dynamic_lib(libname) | ||
| if loaded_dl_from_cache is not loaded_dl_fresh: | ||
| raise RuntimeError("loaded_dl_from_cache is not loaded_dl_fresh") | ||
|
|
||
| loaded_dl_no_cache = _load_lib_no_cache(libname) | ||
| no_cache_abs_path = require_abs_path(loaded_dl_no_cache) | ||
| supported_libs = SUPPORTED_WINDOWS_DLLS if IS_WINDOWS else SUPPORTED_LINUX_SONAMES | ||
| if not loaded_dl_no_cache.was_already_loaded_from_elsewhere and libname in supported_libs: | ||
| raise RuntimeError("not loaded_dl_no_cache.was_already_loaded_from_elsewhere") | ||
| if not os.path.samefile(no_cache_abs_path, fresh_abs_path): | ||
| raise RuntimeError(f"not os.path.samefile({no_cache_abs_path=!r}, {fresh_abs_path=!r})") | ||
| return fresh_abs_path | ||
|
|
||
|
|
||
| def probe_load_nvidia_dynamic_lib_and_print_json(libname: str) -> None: | ||
| from cuda.pathfinder import DynamicLibNotFoundError | ||
|
|
||
| try: | ||
| abs_path = _load_nvidia_dynamic_lib_for_test(libname) | ||
| except DynamicLibNotFoundError: | ||
| sys.stdout.write(f"{DYNAMIC_LIB_NOT_FOUND_MARKER}\n") | ||
| traceback.print_exc(file=sys.stdout) | ||
| return | ||
| sys.stdout.write(f"{json.dumps(abs_path)}\n") | ||
|
|
||
|
|
||
| def main(argv: Sequence[str] | None = None) -> int: | ||
| args = list(sys.argv[1:] if argv is None else argv) | ||
| if len(args) != 1: | ||
| raise SystemExit("Usage: python -m cuda.pathfinder._testing.load_nvidia_dynamic_lib_subprocess <libname>") | ||
| probe_load_nvidia_dynamic_lib_and_print_json(args[0]) | ||
| return 0 | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| raise SystemExit(main()) | ||
131 changes: 0 additions & 131 deletions
131
cuda_pathfinder/cuda/pathfinder/_utils/spawned_process_runner.py
This file was deleted.
Oops, something went wrong.
83 changes: 38 additions & 45 deletions
83
cuda_pathfinder/tests/child_load_nvidia_dynamic_lib_helper.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,61 +1,54 @@ | ||
| # SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| # This helper is factored out so spawned child processes only import this | ||
| # lightweight module. That avoids re-importing the test module (and | ||
| # repeating its potentially expensive setup) in every child process. | ||
| from __future__ import annotations | ||
|
|
||
| import json | ||
| import os | ||
| import subprocess | ||
| import sys | ||
| import traceback | ||
| import tempfile | ||
| from pathlib import Path | ||
|
|
||
| from cuda.pathfinder._testing.load_nvidia_dynamic_lib_subprocess import DYNAMIC_LIB_NOT_FOUND_MARKER | ||
|
|
||
| def build_child_process_failed_for_libname_message(libname, result): | ||
| LOAD_NVIDIA_DYNAMIC_LIB_SUBPROCESS_MODULE = "cuda.pathfinder._testing.load_nvidia_dynamic_lib_subprocess" | ||
| # Launch the child from a neutral directory so `python -m cuda.pathfinder...` | ||
| # resolves the installed package instead of the source checkout. In CI the | ||
| # checkout does not contain the generated `_version.py` file. | ||
| LOAD_NVIDIA_DYNAMIC_LIB_SUBPROCESS_CWD = Path(tempfile.gettempdir()) | ||
| PROCESS_TIMED_OUT = -9 | ||
|
|
||
|
|
||
| def build_child_process_failed_for_libname_message(libname: str, result: subprocess.CompletedProcess[str]) -> str: | ||
| return ( | ||
| f"Child process failed for {libname=!r} with exit code {result.returncode}\n" | ||
| f"--- stdout-from-child-process ---\n{result.stdout}<end-of-stdout-from-child-process>\n" | ||
| f"--- stderr-from-child-process ---\n{result.stderr}<end-of-stderr-from-child-process>\n" | ||
| ) | ||
|
|
||
|
|
||
| def validate_abs_path(abs_path): | ||
| assert abs_path, f"empty path: {abs_path=!r}" | ||
| assert os.path.isabs(abs_path), f"not absolute: {abs_path=!r}" | ||
| assert os.path.isfile(abs_path), f"not a file: {abs_path=!r}" | ||
| def child_process_reported_dynamic_lib_not_found(result: subprocess.CompletedProcess[str]) -> bool: | ||
| return result.stdout.startswith(DYNAMIC_LIB_NOT_FOUND_MARKER) | ||
|
|
||
|
|
||
| def child_process_func(libname): | ||
| from cuda.pathfinder import DynamicLibNotFoundError, load_nvidia_dynamic_lib | ||
| from cuda.pathfinder._dynamic_libs.load_nvidia_dynamic_lib import _load_lib_no_cache | ||
| from cuda.pathfinder._dynamic_libs.supported_nvidia_libs import ( | ||
| SUPPORTED_LINUX_SONAMES, | ||
| SUPPORTED_WINDOWS_DLLS, | ||
| ) | ||
| from cuda.pathfinder._utils.platform_aware import IS_WINDOWS | ||
|
|
||
| def run_load_nvidia_dynamic_lib_in_subprocess( | ||
| libname: str, | ||
| *, | ||
| timeout: float, | ||
| ) -> subprocess.CompletedProcess[str]: | ||
| command = [sys.executable, "-m", LOAD_NVIDIA_DYNAMIC_LIB_SUBPROCESS_MODULE, libname] | ||
| try: | ||
| loaded_dl_fresh = load_nvidia_dynamic_lib(libname) | ||
| except DynamicLibNotFoundError: | ||
| print("CHILD_LOAD_NVIDIA_DYNAMIC_LIB_HELPER_DYNAMIC_LIB_NOT_FOUND_ERROR:") | ||
| traceback.print_exc(file=sys.stdout) | ||
| return | ||
| if loaded_dl_fresh.was_already_loaded_from_elsewhere: | ||
| raise RuntimeError("loaded_dl_fresh.was_already_loaded_from_elsewhere") | ||
| validate_abs_path(loaded_dl_fresh.abs_path) | ||
| assert loaded_dl_fresh.found_via is not None | ||
|
|
||
| loaded_dl_from_cache = load_nvidia_dynamic_lib(libname) | ||
| if loaded_dl_from_cache is not loaded_dl_fresh: | ||
| raise RuntimeError("loaded_dl_from_cache is not loaded_dl_fresh") | ||
|
|
||
| loaded_dl_no_cache = _load_lib_no_cache(libname) | ||
| # check_if_already_loaded_from_elsewhere relies on these: | ||
| supported_libs = SUPPORTED_WINDOWS_DLLS if IS_WINDOWS else SUPPORTED_LINUX_SONAMES | ||
| if not loaded_dl_no_cache.was_already_loaded_from_elsewhere and libname in supported_libs: | ||
| raise RuntimeError("not loaded_dl_no_cache.was_already_loaded_from_elsewhere") | ||
| if not os.path.samefile(loaded_dl_no_cache.abs_path, loaded_dl_fresh.abs_path): | ||
| raise RuntimeError(f"not os.path.samefile({loaded_dl_no_cache.abs_path=!r}, {loaded_dl_fresh.abs_path=!r})") | ||
| validate_abs_path(loaded_dl_no_cache.abs_path) | ||
|
|
||
| print(json.dumps(loaded_dl_fresh.abs_path)) | ||
| return subprocess.run( # noqa: S603 - trusted argv: current interpreter + internal test helper module | ||
| command, | ||
| capture_output=True, | ||
| text=True, | ||
| timeout=timeout, | ||
| check=False, | ||
| cwd=LOAD_NVIDIA_DYNAMIC_LIB_SUBPROCESS_CWD, | ||
| ) | ||
| except subprocess.TimeoutExpired: | ||
| return subprocess.CompletedProcess( | ||
| args=command, | ||
| returncode=PROCESS_TIMED_OUT, | ||
| stdout="", | ||
| stderr=f"Process timed out after {timeout} seconds and was terminated.", | ||
| ) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently wondering: do we need this new subdirectory?
I'll play with this for a few minutes, hoping that we don't have to move the test-only code here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gist of my exploration with Cursor:
We'll merge this PR now and plan a follow-on PR to unify the production and testing subprocess code paths.
Why unify later
load subprocess entrypoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the result of continuing the exploration with Cursor: #1779