Update CUDA/ROCm setup tests#1899
Update CUDA/ROCm setup tests#1899matthewdouglas merged 5 commits intobitsandbytes-foundation:mainfrom
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Review: Update CUDA/ROCm setup tests
(Note: this review was created with the assistance of an LLM.)
Thanks for the PR! The fail-fast behavior for wrong override variables and the HIP_ENVIRONMENT → BNB_BACKEND migration are good improvements. A couple of issues to address before merging:
1. Missing path assertion in test_get_rocm_bnb_library_path_override
The diff removes the functional assertion that validates the override actually produces the correct library path:
assert get_cuda_bnb_library_path(rocm70_spec).stem == "libbitsandbytes_rocm72"The test now only checks assert "BNB_ROCM_VERSION" in caplog.text, which verifies the warning was emitted but not that the correct library was selected. This looks like an accidental omission — please restore it.
2. test_get_cuda_bnb_library_path_override is fragile
This test sets BNB_CUDA_VERSION but doesn't clear BNB_ROCM_VERSION. With the new logic, rocm_override_value is checked first — so if BNB_ROCM_VERSION happens to be set in the environment, the test will raise RuntimeError instead of exercising the CUDA override path. Please add:
monkeypatch.delenv("BNB_ROCM_VERSION", raising=False)(The base test test_get_cuda_bnb_library_path already does this correctly.)
Minor notes
- The dead
re.sub(r"rocm\d+", ...)call in the CUDA+BNB_ROCM_VERSIONerror path is harmless (regex doesn't matchlibbitsandbytes_cuda*) but could be avoided by moving the substitution after thetorch.version.cudacheck. - Consider keeping an adapted version of
test_get_rocm_bnb_library_path_rocm_override_takes_priority— the "both overrides set" edge case is still reachable and worth covering, especially now that the branching logic has changed. Not blocking though.
- Clear stray env vars in override tests to prevent false failures - Branch on active backend first to handle both-overrides-set case - Warn (instead of silently ignoring) the wrong override when both are set - Reject overrides on unsupported backends (e.g. XPU) - Add symmetric both-overrides-set tests for CUDA and ROCm
|
@matthewdouglas everything should be good now, here is an overview of the changes: Changes addressing review feedback on
|
| Scenario | Behavior |
|---|---|
| Only wrong variable set | RuntimeError (unchanged) |
| Both set | Applies the correct one, warns about the ignored wrong one (new) |
| Only correct variable set | Applies override with existing warning (unchanged) |
| Neither set | Uses detected version (unchanged) |
| Other backends (e.g. XPU) | Rejects both overrides with RuntimeError (new) |
tests/test_cuda_setup_evaluator.py
- Fixed
test_get_cuda_bnb_library_path_override— addedmonkeypatch.delenv("BNB_ROCM_VERSION", raising=False)so a stray env var doesn't cause the test to hit the wrong code path - Mirrored in
test_get_rocm_bnb_library_path_override— addedmonkeypatch.delenv("BNB_CUDA_VERSION", raising=False)for symmetry - Added
test_get_cuda_bnb_library_path_cuda_override_takes_priority— both overrides set on CUDA: CUDA wins, ROCm override warned about - Added
test_get_rocm_bnb_library_path_rocm_override_takes_priority— both overrides set on ROCm: ROCm wins, CUDA override warned about - Updated error match strings — reflect new messages (
not a ROCm/CUDA buildinstead ofdetected for CUDA/ROCm)
Tested
ROCm tests verified passing on torch 2.9.1+rocm7.2.0 (4 passed, 4 CUDA-only skipped).
6a18715
into
bitsandbytes-foundation:main
Updated backend-specific setup handling and tests for CUDA/ROCm library selection. The change makes
get_cuda_bnb_library_path()fail fast when the wrong override variable is used (BNB_ROCM_VERSIONon CUDA orBNB_CUDA_VERSIONon ROCm), improves the warning/error guidance, and refreshes the evaluator tests to useBNB_BACKENDwith clearer CUDA/ROCm coverage.Removed the
test_get_rocm_bnb_library_path_rocm_override_takes_priorityunit-test as it doesn't provide much utility.This PR is a subset of the changes originally proposed in #1888