Skip to content

Fix #363: pass results_dir to collect_cluster_info#366

Open
idevasena wants to merge 1 commit intomlcommons:mainfrom
idevasena:fix/issue-363-collect-cluster-info-results-dir
Open

Fix #363: pass results_dir to collect_cluster_info#366
idevasena wants to merge 1 commit intomlcommons:mainfrom
idevasena:fix/issue-363-collect-cluster-info-results-dir

Conversation

@idevasena
Copy link
Copy Markdown
Contributor

Benchmark._collect_cluster_information called collect_cluster_info() without the required results_dir argument, causing a TypeError that was swallowed by the broad exception handler. The resulting None ClusterInformation cascaded into a downstream 'NoneType has no attribute total_memory_bytes' failure during report generation.

  • Pass results_dir=self.run_result_output at the call site
  • Thread ssh_username and shared_staging_dir through from self.args for parity with the SSH collection path
  • Update test_calls_collect_cluster_info_with_correct_params to match the corrected signature (the old assertion ratified the bug)
  • Add TestCollectClusterInfoSignatureBinding regression tests that bind kwargs against the real collect_cluster_info signature and assert the issue collect_cluster_info() missing 1 required positional argument: 'results_dir' #363 warning is no longer emitted

Fixes #363"

Changes

  • mlpstorage_py/benchmarks/base.py: pass results_dir=self.run_result_output. Also thread ssh_username and shared_staging_dir through from self.args for parity with the SSH path.
  • mlpstorage_py/tests/test_benchmarks.py: update test_calls_collect_cluster_info_with_correct_params to match the corrected signature, and add TestCollectClusterInfoSignatureBinding with two regression tests:
    • test_call_binds_to_real_collect_cluster_info_signature — binds the kwargs against inspect.signature(collect_cluster_info) so future signature drift fails at unit-test time.
    • test_warning_message_from_issue_363_is_not_emitted — asserts the verbatim warning string from the issue is no longer produced.

Testing

smrc@dskbd029:~/Storage_Repo_Tests/storage_May8$ uv run pytest mlpstorage_py/tests/test_benchmarks.py -v
================================================================== test session starts ===================================================================
platform linux -- Python 3.12.3, pytest-9.0.2, pluggy-1.6.0 -- /home/smrc/Storage_Repo_Tests/storage_May8/.venv/bin/python
cachedir: .pytest_cache
rootdir: /home/smrc/Storage_Repo_Tests/storage_May8
configfile: pyproject.toml
plugins: hydra-core-1.3.2, mock-3.15.1, cov-7.1.0
collected 17 items                                                                                                                                       

mlpstorage_py/tests/test_benchmarks.py::TestShouldCollectClusterInfo::test_returns_true_with_hosts_and_run_command 
        SETUP    F base_args
        SETUP    F mock_logger
        mlpstorage_py/tests/test_benchmarks.py::TestShouldCollectClusterInfo::test_returns_true_with_hosts_and_run_command (fixtures used: base_args, mock_logger)PASSED
        TEARDOWN F mock_logger
        TEARDOWN F base_args
mlpstorage_py/tests/test_benchmarks.py::TestShouldCollectClusterInfo::test_returns_false_without_hosts 
        SETUP    F base_args
        SETUP    F mock_logger
        mlpstorage_py/tests/test_benchmarks.py::TestShouldCollectClusterInfo::test_returns_false_without_hosts (fixtures used: base_args, mock_logger)PASSED
        TEARDOWN F mock_logger
        TEARDOWN F base_args
mlpstorage_py/tests/test_benchmarks.py::TestShouldCollectClusterInfo::test_returns_false_for_datagen_command 
        SETUP    F base_args
        SETUP    F mock_logger
        mlpstorage_py/tests/test_benchmarks.py::TestShouldCollectClusterInfo::test_returns_false_for_datagen_command (fixtures used: base_args, mock_logger)PASSED
        TEARDOWN F mock_logger
        TEARDOWN F base_args
mlpstorage_py/tests/test_benchmarks.py::TestShouldCollectClusterInfo::test_returns_false_for_configview_command 
        SETUP    F base_args
        SETUP    F mock_logger
        mlpstorage_py/tests/test_benchmarks.py::TestShouldCollectClusterInfo::test_returns_false_for_configview_command (fixtures used: base_args, mock_logger)PASSED
        TEARDOWN F mock_logger
        TEARDOWN F base_args
mlpstorage_py/tests/test_benchmarks.py::TestShouldCollectClusterInfo::test_returns_false_when_skip_cluster_collection_set 
        SETUP    F base_args
        SETUP    F mock_logger
        mlpstorage_py/tests/test_benchmarks.py::TestShouldCollectClusterInfo::test_returns_false_when_skip_cluster_collection_set (fixtures used: base_args, mock_logger)PASSED
        TEARDOWN F mock_logger
        TEARDOWN F base_args
mlpstorage_py/tests/test_benchmarks.py::TestCollectClusterInformation::test_returns_none_when_should_not_collect 
        SETUP    F base_args
        SETUP    F mock_logger
        mlpstorage_py/tests/test_benchmarks.py::TestCollectClusterInformation::test_returns_none_when_should_not_collect (fixtures used: base_args, mock_logger)PASSED
        TEARDOWN F mock_logger
        TEARDOWN F base_args
mlpstorage_py/tests/test_benchmarks.py::TestCollectClusterInformation::test_returns_none_when_not_mpi_exec_type 
        SETUP    F base_args
        SETUP    F mock_logger
        mlpstorage_py/tests/test_benchmarks.py::TestCollectClusterInformation::test_returns_none_when_not_mpi_exec_type (fixtures used: base_args, mock_logger)PASSED
        TEARDOWN F mock_logger
        TEARDOWN F base_args
mlpstorage_py/tests/test_benchmarks.py::TestCollectClusterInformation::test_calls_collect_cluster_info_with_correct_params 
        SETUP    F base_args
        SETUP    F mock_logger
        mlpstorage_py/tests/test_benchmarks.py::TestCollectClusterInformation::test_calls_collect_cluster_info_with_correct_params (fixtures used: base_args, mock_logger)PASSED
        TEARDOWN F mock_logger
        TEARDOWN F base_args
mlpstorage_py/tests/test_benchmarks.py::TestCollectClusterInformation::test_returns_none_on_exception 
        SETUP    F base_args
        SETUP    F mock_logger
        mlpstorage_py/tests/test_benchmarks.py::TestCollectClusterInformation::test_returns_none_on_exception (fixtures used: base_args, mock_logger)PASSED
        TEARDOWN F mock_logger
        TEARDOWN F base_args
mlpstorage_py/tests/test_benchmarks.py::TestCollectClusterInfoSignatureBinding::test_call_binds_to_real_collect_cluster_info_signature 
        SETUP    F base_args
        SETUP    F mock_logger
        mlpstorage_py/tests/test_benchmarks.py::TestCollectClusterInfoSignatureBinding::test_call_binds_to_real_collect_cluster_info_signature (fixtures used: base_args, mock_logger)PASSED
        TEARDOWN F mock_logger
        TEARDOWN F base_args
mlpstorage_py/tests/test_benchmarks.py::TestCollectClusterInfoSignatureBinding::test_warning_message_from_issue_363_is_not_emitted 
        SETUP    F base_args
        SETUP    F mock_logger
        mlpstorage_py/tests/test_benchmarks.py::TestCollectClusterInfoSignatureBinding::test_warning_message_from_issue_363_is_not_emitted (fixtures used: base_args, mock_logger)PASSED
        TEARDOWN F mock_logger
        TEARDOWN F base_args
mlpstorage_py/tests/test_benchmarks.py::TestDLIOBenchmarkAccumulateHostInfo::test_uses_mpi_collection_when_available 
        SETUP    F mock_logger
        mlpstorage_py/tests/test_benchmarks.py::TestDLIOBenchmarkAccumulateHostInfo::test_uses_mpi_collection_when_available (fixtures used: mock_logger)PASSED
        TEARDOWN F mock_logger
mlpstorage_py/tests/test_benchmarks.py::TestDLIOBenchmarkAccumulateHostInfo::test_falls_back_to_args_when_mpi_fails 
        SETUP    F mock_logger
        mlpstorage_py/tests/test_benchmarks.py::TestDLIOBenchmarkAccumulateHostInfo::test_falls_back_to_args_when_mpi_fails (fixtures used: mock_logger)PASSED
        TEARDOWN F mock_logger
mlpstorage_py/tests/test_benchmarks.py::TestWriteClusterInfo::test_writes_cluster_info_file 
SETUP    S tmp_path_factory
        SETUP    F base_args
        SETUP    F mock_logger
        SETUP    F tmp_path (fixtures used: tmp_path_factory)
        mlpstorage_py/tests/test_benchmarks.py::TestWriteClusterInfo::test_writes_cluster_info_file (fixtures used: base_args, mock_logger, request, tmp_path, tmp_path_factory)PASSED
        TEARDOWN F tmp_path
        TEARDOWN F mock_logger
        TEARDOWN F base_args
mlpstorage_py/tests/test_benchmarks.py::TestWriteClusterInfo::test_does_nothing_without_cluster_info 
        SETUP    F base_args
        SETUP    F mock_logger
        SETUP    F tmp_path (fixtures used: tmp_path_factory)
        mlpstorage_py/tests/test_benchmarks.py::TestWriteClusterInfo::test_does_nothing_without_cluster_info (fixtures used: base_args, mock_logger, request, tmp_path, tmp_path_factory)PASSED
        TEARDOWN F tmp_path
        TEARDOWN F mock_logger
        TEARDOWN F base_args
mlpstorage_py/tests/test_benchmarks.py::TestVectorDBBenchmark::test_constructor_accepts_kwargs 
        SETUP    F mock_logger
        mlpstorage_py/tests/test_benchmarks.py::TestVectorDBBenchmark::test_constructor_accepts_kwargs (fixtures used: mock_logger)PASSED
        TEARDOWN F mock_logger
mlpstorage_py/tests/test_benchmarks.py::TestVectorDBBenchmark::test_datasize_does_not_require_pymilvus 
        SETUP    F mock_logger
        mlpstorage_py/tests/test_benchmarks.py::TestVectorDBBenchmark::test_datasize_does_not_require_pymilvus (fixtures used: mock_logger)PASSED
        TEARDOWN F mock_logger
TEARDOWN S tmp_path_factory

=================================================================== 17 passed in 0.76s ===================================================================
smrc@dskbd029:~/Storage_Repo_Tests/storage_May8$ uv run pytest mlpstorage_py/tests/test_benchmarks.py::TestCollectClusterInfoSignatureBinding -v
================================================================== test session starts ===================================================================
platform linux -- Python 3.12.3, pytest-9.0.2, pluggy-1.6.0 -- /home/smrc/Storage_Repo_Tests/storage_May8/.venv/bin/python
cachedir: .pytest_cache
rootdir: /home/smrc/Storage_Repo_Tests/storage_May8
configfile: pyproject.toml
plugins: hydra-core-1.3.2, mock-3.15.1, cov-7.1.0
collected 2 items                                                                                                                                        

mlpstorage_py/tests/test_benchmarks.py::TestCollectClusterInfoSignatureBinding::test_call_binds_to_real_collect_cluster_info_signature 
        SETUP    F base_args
        SETUP    F mock_logger
        mlpstorage_py/tests/test_benchmarks.py::TestCollectClusterInfoSignatureBinding::test_call_binds_to_real_collect_cluster_info_signature (fixtures used: base_args, mock_logger)PASSED
        TEARDOWN F mock_logger
        TEARDOWN F base_args
mlpstorage_py/tests/test_benchmarks.py::TestCollectClusterInfoSignatureBinding::test_warning_message_from_issue_363_is_not_emitted 
        SETUP    F base_args
        SETUP    F mock_logger
        mlpstorage_py/tests/test_benchmarks.py::TestCollectClusterInfoSignatureBinding::test_warning_message_from_issue_363_is_not_emitted (fixtures used: base_args, mock_logger)PASSED
        TEARDOWN F mock_logger
        TEARDOWN F base_args

=================================================================== 2 passed in 0.19s ====================================================================

smrc@dskbd029:~/Storage_Repo_Tests/storage_May8$ uv run mlpstorage training datagen     --hosts localhost,10.10.40.211     --exec-type mpi     --results-dir /tmp/mlps-results -m unet3d -np 1
⠼ Validating environment... 0:00:012026-05-09 04:35:26|INFO: Environment validation passed
2026-05-09 04:35:26|STATUS: Benchmark results directory: /tmp/mlps-results/training/unet3d/datagen/20260509_043524
⠋ Validating environment... ━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1/4 0:00:002026-05-09 04:35:26|STATUS: Running benchmark command:: mpirun -n 1 -host localhost:1,10.10.40.211:0 --bind-to none --map-by node /home/smrc/Storage_Repo_Tests/storage_May8/.venv/bin/dlio_benchmark workload=unet3d_datagen ++hydra.run.dir=/tmp/mlps-results/training/unet3d/datagen/20260509_043524 ++hydra.output_subdir=dlio_config --config-dir=/home/smrc/Storage_Repo_Tests/storage_May8/configs/dlio
Authorization required, but no authorization protocol specified

Authorization required, but no authorization protocol specified

[OUTPUT] [DEBUG DLIOBenchmark.__init__] After LoadConfig:
[OUTPUT]   storage_type   = <StorageType.LOCAL_FS: 'local_fs'>
[OUTPUT]   storage_root   = './'
[OUTPUT]   storage_options= None
[OUTPUT]   data_folder    = 'data/unet3d'
[OUTPUT]   framework      = <FrameworkType.PYTORCH: 'pytorch'>
[OUTPUT]   num_files_train= 168
[OUTPUT]   record_length  = 146600628
[OUTPUT]   generate_data  = True
[OUTPUT]   do_train       = False
[OUTPUT]   do_checkpoint  = False
[OUTPUT]   epochs         = 1
[OUTPUT]   batch_size     = 1
[OUTPUT] 2026-05-09T04:35:35.556995 Running DLIO [Generating data] with 1 process(es)
[OUTPUT] ================================================================================

[=>----------------------------------------------------------] 1.2%  2/168  Generating NPZ Data
[=>----------------------------------------------------------] 1.8%  3/168  
Generating NPZ Data

[=>----------------------------------------------------------] 1.8%  3/168  
Generating NPZ Data

[=>----------------------------------------------------------] 1.8%  3/168  
Generating NPZ 
Data[=>----------------------------------------------------------] 2.4%  4/168  
Generating NPZ Data

[=>----------------------------------------------------------] 1.8%  3/168  
Generating NPZ 
Data[=>----------------------------------------------------------] 2.4%  4/168  
Generating NPZ 
Data[==>---------------------------------------------------------] 3.0%  5/168  
Generating NPZ Data

[=>----------------------------------------------------------] 1.8%  3/168  
Generating NPZ 
Data[=>----------------------------------------------------------] 2.4%  4/168  
Generating NPZ 
Data[==>---------------------------------------------------------] 3.0%  5/168  
Generating NPZ 
Data[==>---------------------------------------------------------] 3.6%  6/168  
Generating NPZ Data

[=>----------------------------------------------------------] 1.8%  3/168  
Generating NPZ 
Data[=>----------------------------------------------------------] 2.4%  4/168  
Generating NPZ 
Data[==>---------------------------------------------------------] 3.0%  5/168  
Generating NPZ 
Data[==>---------------------------------------------------------] 3.6%  6/168  
Generating NPZ 
Data[==>---------------------------------------------------------] 4.2%  7/168  
Generating NPZ Data

⠋ Generating NPZ Data ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/168 0:00:00
[=>----------------------------------------------------------] 1.8%  3/168  
Generating NPZ 
Data[=>----------------------------------------------------------] 2.4%  4/168  
Generating NPZ 
Data[==>---------------------------------------------------------] 3.0%  5/168  
Generating NPZ 
Data[==>---------------------------------------------------------] 3.6%  6/168  
Generating NPZ 
Data[==>---------------------------------------------------------] 4.2%  7/168  
Generating NPZ 
Data[===>--------------------------------------------------------] 4.8%  8/168  
Generating NPZ Data
⠋ Generating NPZ Data ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/168 0:00:00
[=>----------------------------------------------------------] 1.8%  3/168  
Generating NPZ 
Data[=>----------------------------------------------------------] 2.4%  4/168  
Generating NPZ 
Data[==>---------------------------------------------------------] 3.0%  5/168  
Generating NPZ 
Data[==>---------------------------------------------------------] 3.6%  6/168  
Generating NPZ 
Data[==>---------------------------------------------------------] 4.2%  7/168  
Generating NPZ 
Data[===>--------------------------------------------------------] 4.8%  8/168  
Generating NPZ 
Data[===>--------------------------------------------------------] 5.4%  9/168  
Generating NPZ Data
⠋ Generating NPZ Data ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/168 0:00:00
[====>-------------------------------------------------------] 6.0%  10/168  
Generating NPZ Data
⠋ Generating NPZ Data ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/168 0:00:00
[====>-------------------------------------------------------] 6.0%  10/168  
Generating NPZ 
Data[====>-------------------------------------------------------] 6.5%  11/168 
Generating NPZ Data
⠋ Generating NPZ Data ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/168 0:00:00
[====>-------------------------------------------------------] 6.0%  10/168  
Generating NPZ 
Data[====>-------------------------------------------------------] 6.5%  11/168 
Generating NPZ 
Data[====>-------------------------------------------------------] 7.1%  12/168 
Generating NPZ Data
⠋ Generating NPZ Data ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/168 0:00:00
[=====>------------------------------------------------------] 7.7%  13/168  
Generating NPZ 
Data[=====>------------------------------------------------------] 8.3%  14/168 
Generating NPZ Data
⠋ Generating NPZ Data ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/168 0:00:00
[=====>------------------------------------------------------] 8.9%  15/168  
Generating NPZ Data
⠋ Generating NPZ Data ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/168 0:00:00
[=====>------------------------------------------------------] 7.7%  13/168  
Generating NPZ Data
⠋ Generating NPZ Data ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/168 0:00:00
[=====>------------------------------------------------------] 8.9%  15/168  
Generating NPZ 
Data[======>-----------------------------------------------------] 9.5%  16/168 
Generating NPZ Data
⠋ Generating NPZ Data ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/168 0:00:00
[======>-----------------------------------------------------] 10.1%  17/168  
Generating NPZ 
Data[======>-----------------------------------------------------] 10.7%  18/168
Generating NPZ Data
⠋ Generating NPZ Data ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/168 0:00:00
[======>-----------------------------------------------------] 10.1%  17/168  
Generating NPZ Data
⠋ Generating NPZ Data ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/168 0:00:00
[========>---------------------------------------------------] 12.5%  21/168  
Generating NPZ Data
⠋ Generating NPZ Data ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/168 0:00:00
[========>---------------------------------------------------] 12.5%  21/168  
Generating NPZ 
Data[========>---------------------------------------------------] 13.1%  22/168
Generating NPZ Data
⠋ Generating NPZ Data ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/168 0:00:00
[=======>----------------------------------------------------] 11.3%  19/168  
Generating NPZ Data
⠋ Generating NPZ Data ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/168 0:00:00
[=======>----------------------------------------------------] 11.3%  19/168  
Generating NPZ 
Data[=======>----------------------------------------------------] 11.9%  20/168
Generating NPZ Data
⠋ Generating NPZ Data ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/168 0:00:00
⠙ Generating NPZ Data ━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 32/168 0:00:00
⠹ Generating NPZ Data ━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 32/168 0:00:00
⠸ Generating NPZ Data ━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 34/168 0:00:00
⠴ Generating NPZ Data ━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 40/168 0:00:00
⠧ Generating NPZ Data ━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━ 57/168 0:00:00
⠏ Generating NPZ Data ━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━ 71/168 0:00:00
⠙ Generating NPZ Data ━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━ 82/168 0:00:00
⠸ Generating NPZ Data ━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━ 87/168 0:00:01
⠼ Generating NPZ Data ━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━ 103/168 0:00:01
⠦ Generating NPZ Data ━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━ 112/168 0:00:01
⠋ Generating NPZ Data ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━ 140/168 0:00:01
⠙ Generating NPZ Data ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━ 152/168 0:00:01
⠸ Generating NPZ Data ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━ 159/168 0:00:01
[OUTPUT] Data Generation Method: DGEN (default)
[OUTPUT]   dgen-py zero-copy BytesView — 155x faster than NumPy, 0 MiB overhead
[OUTPUT] ================================================================================
[OUTPUT] 2026-05-09T04:35:35.630212 Starting data generation
[OUTPUT] 2026-05-09T04:35:37.971790 Generation done
  Generating NPZ Data ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 168/168 0:00:01

[1A[2K
[OUTPUT] ================================================================================
[OUTPUT] Data Generation Method: DGEN (default)
[OUTPUT]   dgen-py zero-copy BytesView — 155x faster than NumPy, 0 MiB overhead
[OUTPUT] ================================================================================
2026-05-09 04:35:40|STATUS: Writing metadata for benchmark to: /tmp/mlps-results/training/unet3d/datagen/20260509_043524/training_20260509_043524_metadata.json

Signed-off-by: Devasena Inupakutika <devasena.i@samsung.com>
@idevasena idevasena requested a review from a team May 9, 2026 04:45
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 9, 2026

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

collect_cluster_info() missing 1 required positional argument: 'results_dir'

2 participants