Improve autotune batch size and CPU count scanning#179
Conversation
…s at 8 - cpu_count() now parses /proc/cpuinfo for unique (physical_id, core_id) pairs so SMT siblings don't double-count, falling back to os.cpu_count(). - Default FF / embed / substruct search spaces use a categorical list of multiples of 64 for batchSize (kernels are tile-tuned for those sizes). - batchesPerGpu / workerThreads are now capped at min(8, cpus // num_gpus); 8 is the empirical point of diminishing returns and the floor prevents CPU oversubscription across GPUs.
|
| Filename | Overview |
|---|---|
| nvmolkit/autotune/_core.py | Adds stepped integer range (low, high, step) support to suggest_from_space (with step <= 0 guard) and collect_int_from_space (snapping to nearest multiple of step from low). |
| nvmolkit/autotune/_ff_common.py | Introduces _physical_cpu_count_from_proc to read distinct (physical id, core id) pairs from /proc/cpuinfo, falling back to os.cpu_count() if the file is missing or fields are absent; default_ff_search_space switches batchSize to stepped multiples of 64 and caps batchesPerGpu at 8. |
| nvmolkit/autotune/tune_embed_molecules.py | Mirrors the FF changes: batchSize switched to (64, 1024, 64) stepped range and batchesPerGpu capped at min(8, cpus // num_gpus). |
| nvmolkit/autotune/tune_substructure.py | Same pattern: batchSize switched to (128, 1024, 128) and workerThreads per-GPU cap now also bounded at 8. |
| nvmolkit/tests/test_autotune.py | Updated existing tests for the new per-GPU-8 cap and adds new tests for stepped batchSize, the batchesPerGpu 8-cap, and _physical_cpu_count_from_proc SMT deduplication. |
Reviews (3): Last reviewed commit: "formatting" | Re-trigger Greptile
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
evasnow1992
left a comment
There was a problem hiding this comment.
LGTM. Only one minor comment. Thanks!
Autotune now steps in 64 element increments by default, cutting down on the search space. CPU space is now physical core limited by default.