Skip to content

Fix bug 5811173: update cufile tests and configuration#1821

Open
rsarpangalav wants to merge 4 commits intoNVIDIA:mainfrom
rsarpangalav:bug/5811173-fix
Open

Fix bug 5811173: update cufile tests and configuration#1821
rsarpangalav wants to merge 4 commits intoNVIDIA:mainfrom
rsarpangalav:bug/5811173-fix

Conversation

@rsarpangalav
Copy link
Copy Markdown

test_cufile.py: skip compat bool params in set_get_parameter_bool

Avoid setting allow_compat_mode/force_compat_mode before driver_open; pending
values can be applied on first open and interact badly with cufile.json when
nvidia-fs is not loaded (DRIVER_NOT_INITIALIZED). Compat behavior remains
covered elsewhere.

cufile.json: Set allow_compat_mode to true

Description

closes

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@copy-pr-bot
Copy link
Copy Markdown
Contributor

copy-pr-bot Bot commented Mar 25, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@rsarpangalav rsarpangalav marked this pull request as draft March 25, 2026 17:47
@rsarpangalav
Copy link
Copy Markdown
Author

/ok to test

@rwgk
Copy link
Copy Markdown
Contributor

rwgk commented Mar 25, 2026

/ok to test 1b672c9

@github-actions
Copy link
Copy Markdown

@leofang leofang added this to the cuda.bindings next milestone Apr 8, 2026
@leofang leofang added bug Something isn't working test Improvements or additions to tests cuda.bindings Everything related to the cuda.bindings module labels Apr 8, 2026
@leofang
Copy link
Copy Markdown
Member

leofang commented Apr 21, 2026

@rsarpangalav let us know if this is ready, thanks!

    test_cufile.py: skip compat bool params in set_get_parameter_bool

    Avoid setting allow_compat_mode/force_compat_mode before driver_open; pending
    values can be applied on first open and interact badly with cufile.json when
    nvidia-fs is not loaded (DRIVER_NOT_INITIALIZED). Compat behavior remains
    covered elsewhere.

    cufile.json: Set allow_compat_mode to true
Open driver once to read size_t/bool/string originals, then close before
set/get/restore round-trips so pending does not restore invalid pre-open
values (e.g. per-buffer cache 0). Aligns with review feedback.
@github-actions github-actions Bot added the Needs-Restricted-Paths-Review PR touches cuda_bindings or cuda_python; only NVIDIA employees may modify these paths; see LICENSEs label Apr 27, 2026
@rsarpangalav rsarpangalav marked this pull request as ready for review May 1, 2026 21:34
@rwgk rwgk self-requested a review May 1, 2026 22:17
Comment thread cuda_bindings/tests/test_cufile.py
@rwgk rwgk removed the Needs-Restricted-Paths-Review PR touches cuda_bindings or cuda_python; only NVIDIA employees may modify these paths; see LICENSEs label May 4, 2026
@rwgk rwgk requested a review from sourabgupta3 May 5, 2026 22:41
@leofang
Copy link
Copy Markdown
Member

leofang commented May 7, 2026

@rsarpangalav plz resolve conflicts, thx

Copy link
Copy Markdown
Contributor

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor GPT-5.4 Extra High Fast

As generated:

Findings

  1. High: the new post-driver_open() baseline snapshots in the parameter API tests do not load the compat-enabled test config.

    test_set_get_parameter_size_t(), test_set_get_parameter_bool(), and test_set_get_parameter_string() now call _cufile_driver_session() before collecting their original values. However, those tests still only opt into ctx, not cufile_env_json, so their first driver_open() does not consume cuda_bindings/tests/cufile.json.

    That means the new allow_compat_mode: true setting only affects the async tests, while the parameter tests can still hit the same DRIVER_NOT_INITIALIZED path on systems without nvidia-fs. In other words, the PR adds the config needed to avoid the failure, but does not wire that config into the newly introduced driver_open() calls that need it most.

  2. Medium: the bool round-trip test drops direct coverage for the compat parameters.

    test_set_get_parameter_bool() removes PROPERTIES_ALLOW_COMPAT_MODE and FORCE_COMPAT_MODE from param_val_pairs. I could not find another test in the repo that round-trips those two bool config APIs, so the current PR avoids the failure by skipping the affected parameters instead of exercising them in the intended compat-enabled environment.

Suggested Fix

The minimal local fix is:

  • add cufile_env_json to the three parameter snapshot tests so their first driver_open() actually sees the compat-enabled test JSON
  • restore the compat bool parameters to test_set_get_parameter_bool() now that the compat-enabled config is guaranteed to be active
diff --git a/cuda_bindings/tests/test_cufile.py b/cuda_bindings/tests/test_cufile.py
index 70a7bc80c8..e1da6a3349 100644
--- a/cuda_bindings/tests/test_cufile.py
+++ b/cuda_bindings/tests/test_cufile.py
@@ -1387,7 +1387,7 @@ def test_batch_io_large_operations():
 @pytest.mark.skipif(
     cufileVersionLessThan(1140), reason="cuFile parameter APIs require cuFile library version 1.14.0 or later"
 )
-@pytest.mark.usefixtures("ctx")
+@pytest.mark.usefixtures("ctx", "cufile_env_json")
 def test_set_get_parameter_size_t():
     """Test setting and getting size_t parameters with cuFile validation."""
     param_val_pairs = (
@@ -1424,17 +1424,11 @@ def test_set_get_parameter_size_t():
 @pytest.mark.skipif(
     cufileVersionLessThan(1140), reason="cuFile parameter APIs require cuFile library version 1.14.0 or later"
 )
-@pytest.mark.usefixtures("ctx")
+@pytest.mark.usefixtures("ctx", "cufile_env_json")
 def test_set_get_parameter_bool():
     """Test setting and getting boolean parameters with cuFile validation."""
-    # Do not exercise allow/force compat via set_parameter_bool before any driver_open:
-    # pending API values are applied after JSON load on first open and can overwrite
-    # cufile.json (e.g. allow_compat_mode: true), causing DRIVER_NOT_INITIALIZED when
-    # nvidia-fs is not loaded. Other tests cover compat behavior where appropriate.
-    _COMPAT_PARAMS = (
-        cufile.BoolConfigParameter.PROPERTIES_ALLOW_COMPAT_MODE,
-        cufile.BoolConfigParameter.FORCE_COMPAT_MODE,
-    )
+    # Load the compat-enabled test config before the first driver_open so the compat
+    # bool params can still be round-tripped on systems without nvidia-fs.
     param_val_pairs = (
         (cufile.BoolConfigParameter.PROPERTIES_USE_POLL_MODE, True),
         (cufile.BoolConfigParameter.PROPERTIES_ALLOW_COMPAT_MODE, False),
@@ -1449,12 +1443,9 @@ def test_set_get_parameter_bool():
         (cufile.BoolConfigParameter.SKIP_TOPOLOGY_DETECTION, False),
         (cufile.BoolConfigParameter.STREAM_MEMOPS_BYPASS, True),
     )
-    param_val_pairs = tuple((p, v) for p, v in param_val_pairs if p not in _COMPAT_PARAMS)
     # PROFILE_NVTX is deprecated (CTK 13.1.0+); cuFile >= 1.16 rejects bool getters for it.
     if cufile.get_version() >= 1160:
-        param_val_pairs = tuple(
-            (p, v) for p, v in param_val_pairs if p is not cufile.BoolConfigParameter.PROFILE_NVTX
-        )
+        param_val_pairs = tuple((p, v) for p, v in param_val_pairs if p is not cufile.BoolConfigParameter.PROFILE_NVTX)
 
     with _cufile_driver_session():
         originals = {param: cufile.get_parameter_bool(param) for param, _ in param_val_pairs}
@@ -1474,7 +1465,7 @@ def test_set_get_parameter_bool():
 @pytest.mark.skipif(
     cufileVersionLessThan(1140), reason="cuFile parameter APIs require cuFile library version 1.14.0 or later"
 )
-@pytest.mark.usefixtures("ctx")
+@pytest.mark.usefixtures("ctx", "cufile_env_json")
 def test_set_get_parameter_string(tmp_path):
     """Test setting and getting string parameters with cuFile validation."""
     temp_dir = tempfile.gettempdir()

Local Validation

I applied the suggested patch locally and ran the focused cuFile cases in TestVenv:

TestVenv/bin/python -m pytest -q -ra \
  cuda_bindings/tests/test_cufile.py::test_set_get_parameter_size_t \
  cuda_bindings/tests/test_cufile.py::test_set_get_parameter_bool \
  cuda_bindings/tests/test_cufile.py::test_set_get_parameter_string \
  cuda_bindings/tests/test_cufile.py::test_cufile_write_async \
  cuda_bindings/tests/test_cufile.py::test_cufile_read_async \
  cuda_bindings/tests/test_cufile.py::test_cufile_async_read_write

Result:

......                                                                   [100%]
6 passed in 7.02s
warn: error opening log file: No such file or directory, logging will be disabled

I also ran pre-commit run --all-files locally after the patch, and it passed.

@rwgk
Copy link
Copy Markdown
Contributor

rwgk commented May 7, 2026

@rsarpangalav @sourabgupta3 It'd be very helpful if this is merged by tomorrow afternoon, so I can integrate it for the next cycle of QA testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working cuda.bindings Everything related to the cuda.bindings module test Improvements or additions to tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants