Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
72 commits
Select commit Hold shift + click to select a range
0252788
Initial pod communication support (#235)
gilbertlee-amd Feb 20, 2026
9edaae8
Adjust min HIP version in Makefile for pod support
gilbertlee-amd Feb 20, 2026
2b17e62
Adding TB_DUMP_CFG_FILE and fixing a deallocation bug
gilbertlee-amd Feb 21, 2026
2b707b7
Adding gfx1250 to CMakeFiles
gilbertlee-amd Feb 24, 2026
2bb8302
Adjusting how HIP headers are included
gilbertlee-amd Feb 25, 2026
060abc2
Updating a2asweep and scaling presets
gilbertlee-amd Mar 9, 2026
6c2ecf7
Fixing logging to prevent recursive error
gilbertlee-amd Mar 9, 2026
794bcf7
Fixing fabric handle bug
gilbertlee-amd Mar 10, 2026
8de0154
Changing table formatting to make it easier to paste
gilbertlee-amd Mar 10, 2026
4a0f390
Showing num iterations when running in timed mode
gilbertlee-amd Mar 13, 2026
bf49ba4
cuda + MNNVL update & pod presets (#241)
AtlantaPepsi Mar 16, 2026
5e61666
Changing NIC_FILTER to TB_NIC_FILTER
gilbertlee-amd Mar 17, 2026
bec2c5e
prefixing remaining env vars with TB_, fixing potential filesystem ch…
gilbertlee-amd Mar 17, 2026
561e2f7
Fixing TB_PAUSE issue
gilbertlee-amd Mar 18, 2026
94cf3c9
Merge pull request #245 from ROCm/develop
AtlantaPepsi Mar 19, 2026
eb92015
Increase CQ size for high qps (#244)
pierreantoineH Mar 19, 2026
275998b
fix hang when NVML is present but fabricmanager isnt (#246)
AtlantaPepsi Mar 23, 2026
168cdc1
Adding HBM read bandwidth preset (#250)
gilbertlee-amd Mar 28, 2026
fdec7d5
Adding TB_WALLCLOCK_RATE in case wallclock rate is reported as 0
gilbertlee-amd Mar 30, 2026
bae804c
Fixing numeric limits from min to lowest for doubles
gilbertlee-amd Apr 5, 2026
a03b06e
Fixing CMakeLists missing rename of ENABLE_DMA_BUF
gilbertlee-amd Apr 5, 2026
1ef9c51
Adding XCC detection for GFX12, increasing max GFX unroll to 16
gilbertlee-amd Apr 8, 2026
2900b4e
gfxsweep preset (#254)
AtlantaPepsi Apr 10, 2026
2dba07f
Adding Batched DMA support (hipMemcpyBatchAsync), and bmasweep preset…
gilbertlee-amd Apr 14, 2026
2aa036c
Modifying the gfxsweep preset (#256)
gilbertlee-amd Apr 17, 2026
b57f2e2
Adding a wallclock consistency detection preset (#258)
gilbertlee-amd Apr 19, 2026
a4fc836
Adding smoketest preset for simple correctness tests (#266)
gilbertlee-amd Apr 26, 2026
3744d24
Help / envvars / presets presets (#267)
gilbertlee-amd Apr 26, 2026
59fbe56
Modernize CMake build (#268)
nileshnegi Apr 27, 2026
a46553c
Replace version-based pod/amd-smi detection with compile-time API pro…
nileshnegi Apr 27, 2026
4adc3a9
Fix collective mismatch hangs in multi-rank error paths (#270)
nileshnegi Apr 27, 2026
9c6c0e1
Fix SHOW_ITERATIONS table truncation with multiple transfers per exec…
nileshnegi Apr 27, 2026
1281d0c
Reformat a2asweep output to match gfxsweep style (#272)
nileshnegi Apr 27, 2026
5c86630
Gfx sweep update (#274)
gilbertlee-amd Apr 27, 2026
87cb8ee
Increasing flush frequency in smoketest (#275)
gilbertlee-amd Apr 28, 2026
0621e90
Pod Ring preset (#251)
AtlantaPepsi Apr 28, 2026
3733ea4
Hotfixes v1 for v1.67.0 release (#276)
nileshnegi Apr 29, 2026
c6c3636
Adding nica2a preset (#248)
pierreantoineH Apr 29, 2026
350e4e5
Fixes for cuMem compilation and invalid device ordinal (#278)
AtlantaPepsi Apr 29, 2026
005d26c
Adding new experimental copy-only GFX kernel, gfxsweep update (#277)
gilbertlee-amd Apr 30, 2026
15b7605
Simplifying socket connect, allow for using host address (#279)
gilbertlee-amd May 1, 2026
d36cc23
Adding SHOW_PERCENTILES to show extra per-iteration statistics (#281)
gilbertlee-amd May 2, 2026
fd7257c
Updating podring to run on single node without need to force single p…
gilbertlee-amd May 2, 2026
90ae370
Fixing missing std::max, updating client description to include POD s…
gilbertlee-amd May 3, 2026
5a7bec6
Parallel rings update (#283)
gilbertlee-amd May 3, 2026
8713532
Adding RUN_PARALLEL to smoke test preset (#289)
gilbertlee-amd May 5, 2026
5ae6024
Adding missing flush to gfxsweep preset (#290)
gilbertlee-amd May 5, 2026
f903e03
Improve verbose/debug instrumentation (#288)
nileshnegi May 6, 2026
6737fb2
Tighten preset validation and fix correctness bugs (#286)
nileshnegi May 6, 2026
60c5faf
Modification to A2A presets (#259)
AtlantaPepsi May 7, 2026
9d29984
revert clock check (#291)
AtlantaPepsi May 8, 2026
aeb3647
Adding TB_WALLCLOCK_RATE into hbmtest (#293)
gilbertlee-amd May 10, 2026
9896ef9
Adding LaunchTransferBench helper script (#294)
gilbertlee-amd May 10, 2026
fa10775
Adding 'empty' kernel launch preset (#297)
gilbertlee-amd May 12, 2026
297e00c
Adding ability to remove barrier, mask off XCCs (#298)
gilbertlee-amd May 14, 2026
e8edacf
improve limit reached message (#302)
AtlantaPepsi May 15, 2026
24cfdc7
Fixing NUMA checks (set_mempolicy) (#303)
gilbertlee-amd May 15, 2026
fcb2a3c
Fixing nearest GPU numa detection (#305)
gilbertlee-amd May 19, 2026
c1c3561
[smoketest] Adding BDMA, A2A-remoteread, dma,gfx,fast testlists (#306)
gilbertlee-amd May 20, 2026
48d4fb5
[empty] Adding ability to switch between hipExtLaunch and default mod…
gilbertlee-amd May 20, 2026
6f5ea52
[wallclock] Adding average usec cost for timestamp collection on GPU …
gilbertlee-amd May 20, 2026
d777426
[empty] Adding SHOW_PERCENTILES support (#310)
gilbertlee-amd May 23, 2026
0c9b70f
Merge branch 'develop' into candidate
nileshnegi May 23, 2026
e21806e
[ci] Run build pipeline on candidate without publishing artifacts/pac…
nileshnegi May 26, 2026
e923e24
removing secondary reordering (#314)
AtlantaPepsi May 26, 2026
7e8b1bb
add NIC_TRAFFIC_CLASS and NIC_SERVICE_LEVEL env vars for DSCP marking…
paklui May 26, 2026
479408d
Minor change to output format (#317)
gilbertlee-amd Jun 1, 2026
9b22b00
Embed git branch and commit hash in version string (#312)
nileshnegi Jun 1, 2026
b729d1b
Fix memory allocation bug to use the correct hipDevice (#308)
nileshnegi Jun 1, 2026
26c9cf8
disable pinned host memory for pod (#318)
AtlantaPepsi Jun 1, 2026
6f73586
Potential fix for pull request finding
gilbertlee-amd Jun 1, 2026
6943b68
Potential fix for pull request finding
gilbertlee-amd Jun 1, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 12 additions & 9 deletions .github/workflows/build-relocatable-packages.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@ name: Build Relocatable Packages

on:
push:
branches: [develop, mainline, 'release/**', candidate]
branches: [candidate, develop, mainline, 'release/**']
pull_request:
branches: [develop, mainline]
branches: [candidate, develop, mainline]
schedule:
# Daily at 13:00 UTC (5:00 AM PST)
- cron: '0 13 * * *'
Expand Down Expand Up @@ -74,7 +74,8 @@ jobs:
dpkg-deb -c "${deb}" | head -50
done

- name: Upload artifacts (always, for inspection)
- name: Upload artifacts
if: github.ref_name != 'candidate' && github.base_ref != 'candidate'
uses: actions/upload-artifact@v4
with:
name: ubuntu-22.04-packages
Expand All @@ -84,14 +85,14 @@ jobs:
if-no-files-found: error

- name: Configure AWS credentials (OIDC)
if: github.repository == 'ROCm/TransferBench' && vars.AWS_S3_BUCKET != ''
if: github.repository == 'ROCm/TransferBench' && vars.AWS_S3_BUCKET != '' && github.ref_name != 'candidate' && github.base_ref != 'candidate'
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ secrets.AWS_ROLE_ARN }}
aws-region: us-east-1

- name: Upload to S3
if: github.repository == 'ROCm/TransferBench' && vars.AWS_S3_BUCKET != ''
if: github.repository == 'ROCm/TransferBench' && vars.AWS_S3_BUCKET != '' && github.ref_name != 'candidate' && github.base_ref != 'candidate'
env:
AWS_S3_BUCKET: ${{ vars.AWS_S3_BUCKET }}
run: |
Expand Down Expand Up @@ -183,7 +184,8 @@ jobs:
rpm -qlp "${rpm}" | head -50
done

- name: Upload artifacts (always, for inspection)
- name: Upload artifacts
if: github.ref_name != 'candidate' && github.base_ref != 'candidate'
uses: actions/upload-artifact@v4
with:
name: manylinux_2_28-packages
Expand All @@ -193,20 +195,20 @@ jobs:
if-no-files-found: error

- name: Install AWS CLI
if: github.repository == 'ROCm/TransferBench' && vars.AWS_S3_BUCKET != ''
if: github.repository == 'ROCm/TransferBench' && vars.AWS_S3_BUCKET != '' && github.ref_name != 'candidate' && github.base_ref != 'candidate'
run: |
curl -fsSL "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o /tmp/awscli.zip
(cd /tmp && unzip -q awscli.zip && ./aws/install)

- name: Configure AWS credentials (OIDC)
if: github.repository == 'ROCm/TransferBench' && vars.AWS_S3_BUCKET != ''
if: github.repository == 'ROCm/TransferBench' && vars.AWS_S3_BUCKET != '' && github.ref_name != 'candidate' && github.base_ref != 'candidate'
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ secrets.AWS_ROLE_ARN }}
aws-region: us-east-1

- name: Upload to S3
if: github.repository == 'ROCm/TransferBench' && vars.AWS_S3_BUCKET != ''
if: github.repository == 'ROCm/TransferBench' && vars.AWS_S3_BUCKET != '' && github.ref_name != 'candidate' && github.base_ref != 'candidate'
env:
AWS_S3_BUCKET: ${{ vars.AWS_S3_BUCKET }}
run: |
Expand Down Expand Up @@ -295,6 +297,7 @@ jobs:
cat report/build-report.md >> "$GITHUB_STEP_SUMMARY"

- name: Upload report
if: github.ref_name != 'candidate' && github.base_ref != 'candidate'
uses: actions/upload-artifact@v4
with:
name: build-report
Expand Down
3 changes: 2 additions & 1 deletion .github/workflows/codeql.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,12 @@ name: "CodeQL Security Scanning"
on:
push:
branches:
- candidate
- develop
- mainline
- candidate
pull_request:
branches:
- candidate
- develop
- mainline
schedule:
Expand Down
42 changes: 42 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,48 @@
Documentation for TransferBench is available at
[https://rocm.docs.amd.com/projects/TransferBench](https://rocm.docs.amd.com/projects/TransferBench).

## v1.67.00
### Added
- Added NIC_TRAFFIC_CLASS to set the DSCP/traffic class byte in the RoCE GRH for QPs (RoCE only)
- Added NIC_SERVICE_LEVEL to set the IB service level (sl) for QPs (IB and RoCE)
- Initial support for pod communication. Requires compatible hardware / ROCm version and subject to further testing
- This potentially enables GFX/DMA executors to access SRC/DST memory locations on GPUs within the same pod
- Pod membership requires amd-smi however can be skipped by setting TB_FORCE_SINGLE_POD=1
- Support for dumping executed Transfers to a config file specified by TB_DUMP_CFG_FILE
- This will write Transfers that are executed (for example via a preset) to a config file that can then be executed
- Reporting number of iterations run when running in timed mode (NUM_ITERATIONS < 0)
- Adding NIC_CQ_POLL_BATCH to control CQ poll batch size for NIC transfers
- New "hbm" preset which sweeps and tests local HBM read performance
- Added a new TB_WALLCLOCK_RATE that will override GPU GFX wallclock rate if it returns 0 (debug)
- Adding new batched-DMA executor "B", which utilizes the hipMemcpyBatchAsync API introduced in HIP 7.1 / CUDA 12.8
- Added new "bmasweep" preset that compares DMA to batched DMA execution for parallel transfers to other GPUs
- Added new "wallclock" preset that compares wallclock counters across XCCs within a GPU
- Added new "smoketest" preset that runs a variety of DMA/GFX tests for simple correctness tests
- Added new "help" preset to show config file examples
- Added new "presets" preset to show available presets and their descriptions
- Added new "rings" preset that runs parallel rings of transfers (pod-capable)
- Added new "envvars" preset to show environment variables that can change TransferBench behavior
- Adding information on how to run multi-rank with TransferBench, when run with no args
- Added new "nica2a" preset (NIC all-to-all over GPUs via NIC executors, multi-node)
- Added new GFX_KERNEL to allow experimenting with copy-only GFX kernel. Currently this is opt-in only
- Added `SHOW_PERCENTILES` (e.g. `50,75,90,95,99`) to show empirical percentiles of per-iteration duration
- Adding new LaunchTransferBench.sh script to simplify launching TransferBench across multiple nodes (via socket)
- New `empty` preset (EmptyKernel) to measure empty-kernel launch latency with BATCHSIZES/GRIDSIZES/BLOCKSIZES sweeps

### Modified
- DMA-BUF support enablement in CMake changed to ENABLE_DMA_BUF to be more similar to other compile-time options
- Adding extra information to CMake and make build methods to indicate enabled / disabled features
- a2asweep preset changes from USE_FINE_GRAIN to MEM_TYPE to reflect various memory types
- a2asweep preset changes from NUM_CUS to NUM_SUB_EXECS to match with a2a preset naming convention
- scaling preset changes from using USE_FINE_GRAIN to CPU_MEM_TYPE and GPU_MEM_TYPE
- NIC_FILTER renamed to TB_NIC_FILTER for consistency
- DUMP_LINES renamed to TB_DUMP_LINES for consistency
- Dynamically size CQs for NIC transfers in high QPs case
- Switch to using hipMemcpyDeviceToDeviceNoCU instead of hipMemcpyDefault for DMA Executor if available (requires HIP >= 6.0)
- Allow for multiple destination memory locations for DMA/Batched-DMA Transfers
- Removed env vars printing and preset print when running TransferBench with no args
- Modification to simplify socket comm usage - first rank only needs to set TB_NUM_RANKS=X to see connection info

## v1.66.02
### Added
- Adding DMA-BUF support
Expand Down
Loading
Loading