Ci deps group#1634
Conversation
Address the failures observed after enabling the full extras surface in CI: * Drop sparse_dot_mkl entirely. bsms.py now uses scipy CSR `@` for the adjacency-squaring step, which is equivalent to the previous dot_product_mkl call without the libmkl_rt runtime dependency. Removed from the CI install step, Dockerfile, install-hint registry, the four bsms test decorators, and the aero_graph_net README. * Bump container /dev/shm to 2 GiB. DALI's multiprocess worker pool exhausts the docker default (64 MiB) via SemLock allocations, which surfaces as `OSError: No space left on device` in datapipes tests. Applied to all 5 container blocks across github-pr.yml and github-nightly-uv.yml. * Force-reinstall the PyG-ecosystem wheels (torch_scatter, torch_sparse, torch_cluster, pyg_lib) from the PyG wheel index matching the locked torch version. Without CUDA toolchain visible at uv-sync time, uv lands the CPU-only source build, which makes FigConvNet (and any model calling segment_csr / similar on CUDA tensors) fail with "Not compiled with CUDA support". URL is held in PYG_WHL_INDEX so the torch-version coupling lives in one spot. * Narrow the natten test_backward skip. The previous device=="cpu" early-skip was too broad; now we wrap the forward call and only skip on the specific NotImplementedError raised by FlexAttention's CPU-backward guard. If natten picks a different backend (or FlexAttention ever supports CPU backward), the test will run.
Greptile SummaryThis PR aligns GitHub CI with the internal Blossom CI environment by expanding installed extras, adding CI-only test dependency installation (PyG wheels, moto, scikit-image, etc.), increasing
Important Files Changed
|
| # wheels matching the locked torch version. | ||
| - name: Install CI-only test dependencies | ||
| shell: bash | ||
| env: | ||
| UV_LINK_MODE: copy | ||
| PYG_WHL_INDEX: "https://data.pyg.org/whl/torch-2.11.0+cu128.html" | ||
| # cuml-cu12 -> cudf -> numba caps numpy at <=2.2; uv.lock pins | ||
| # numpy 2.2.6 under the cu12 extra precisely for this reason. | ||
| # Without an explicit constraint here, uv pip install resolves | ||
| # transitive deps (e.g. tensorstore, pyarrow) freely and bumps | ||
| # numpy to 2.4.x, which crashes every test that touches cuml. | ||
| # Re-applying the cap on each layered install keeps numba happy. | ||
| NUMPY_PIN: "numpy<2.3" | ||
| run: | | ||
| set -euo pipefail |
There was a problem hiding this comment.
Hardcoded torch wheel index URL must track the torch pin
PYG_WHL_INDEX encodes torch-2.11.0+cu128 literally. If the torch pin in uv.lock is ever bumped, this URL silently serves incompatible wheels (or 404s for the new version) until it is also updated. The inline comment already warns about this, but there is no automated guard (e.g., a CI step that extracts the locked torch version and verifies the URL matches) to catch drift. Consider whether a brief assertion or a shared variable derivation from the lockfile would be feasible to make the coupling more explicit.
pzharrington
left a comment
There was a problem hiding this comment.
Natten tweaks look fine to me
This PR opens a pull request to bring our github CI in line with blossom CI.
I had to tweak a few things to get tests to pass so a few folks are getting pulled in as code owner. Please review the changes.
@pzharrington I'm tagging you specifically because I had to tweak the natten tests, for Flex attention backward on CPU? I don't know why that passes on blossom and not github...
PhysicsNeMo Pull Request
Description
Checklist
Dependencies
Review Process
All PRs are reviewed by the PhysicsNeMo team before merging.
Depending on which files are changed, GitHub may automatically assign a maintainer for review.
We are also testing AI-based code review tools (e.g., Greptile), which may add automated comments with a confidence score.
This score reflects the AI’s assessment of merge readiness and is not a qualitative judgment of your work, nor is
it an indication that the PR will be accepted / rejected.
AI-generated feedback should be reviewed critically for usefulness.
You are not required to respond to every AI comment, but they are intended to help both authors and reviewers.
Please react to Greptile comments with 👍 or 👎 to provide feedback on their accuracy.