Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
90 commits
Select commit Hold shift + click to select a range
e2f2a81
fix gemm timing logic (#92)
rahul-anand Jan 29, 2026
20286e5
Add `gcs-bucket-csv-dir` to support GCS upload
linamy85 Jan 27, 2026
8dedb4c
Add automation script and an HBM yaml example.
hylin2002 Jan 27, 2026
2bf6a94
Add aggregator yaml file.
hylin2002 Jan 27, 2026
00fa7b2
[Automation] Add readme and node-pools topology check
linamy85 Jan 27, 2026
99fa6b1
Update automation script and yaml files for different topologies.
hylin2002 Jan 27, 2026
91d4638
[Automation] Error catch and failure retry
linamy85 Jan 27, 2026
62a0461
[Automation] Add missing topology tracking in check_node_pool_setup.sh
hylin2002 Jan 28, 2026
991cb2f
[Automation] Add topology-aware node pool validation.
hylin2002 Jan 28, 2026
6b5c66b
[Automation] Update configurations for GEMM, H2D and Collectives
hylin2002 Jan 29, 2026
926a7c0
[Automation] Update `automation_launch.sh`
hylin2002 Jan 29, 2026
a86ab39
[Automation] Enable kueue to prevent deadlock from race condition
hylin2002 Jan 29, 2026
f53bf5e
[Automation] Update aggregator
hylin2002 Jan 29, 2026
a06944d
[Automation] Update aggregator and rename host to device yaml files
hylin2002 Jan 29, 2026
ddd3473
[Automation] Delete unused yaml file and update aggregator file
hylin2002 Jan 29, 2026
9ea7110
[Automation] Update aggregator
hylin2002 Jan 29, 2026
bb5fc2f
Add dtype to H2D/D2H
linamy85 Jan 30, 2026
5315132
[Automation] Automatically delete aggregator after completion
hylin2002 Jan 30, 2026
4bdf77f
Update README with kueue and reformat
linamy85 Jan 30, 2026
2de55f4
Add dtype to aggregator H2D method
linamy85 Jan 30, 2026
ed9f6ef
Remove unnecessary columns when aggregating and fix a typo of per_device
hylin2002 Jan 30, 2026
aad5d9d
Create config folder and modify kubenetes yaml for gemm test
hylin2002 Jan 30, 2026
cb79abb
Update aggregator for gemm test
hylin2002 Jan 30, 2026
b10e6bb
Add dtype string in aggregated TSV file
hylin2002 Jan 30, 2026
6f525bf
Add multiple precisions for HBM test
hylin2002 Jan 30, 2026
69661f9
Print pending process status every minute
linamy85 Jan 30, 2026
3e4b59a
Revert the changes that were made for an urgent demo (#90)
chishuen Jan 27, 2026
0b24e56
[Ironwood] Add pipelined H2D mode to H2D benchmark
leonardchan Jan 30, 2026
a70b701
add extra datatypes in configs (#94)
junjieqian Feb 1, 2026
94ddada
add GCS service account name to job yamls (#95)
junjieqian Feb 2, 2026
e7e10f9
[Automation] GCS Permission check and fix
linamy85 Feb 2, 2026
30db8d0
Inject service account spec to Aggregator
linamy85 Feb 2, 2026
5dd6f85
Add bmm microbenchmark. (#97)
hylin2002 Feb 4, 2026
ef5ad1c
Add --numactl_bind flag to H2D benchmark script
leonardchan Feb 4, 2026
be12de0
Add --numactl_bind flag to H2D benchmark script
leonardchan Feb 4, 2026
21c6940
Add --numactl_bind flag to H2D benchmark script
leonardchan Feb 4, 2026
ded42a6
Merge pull request #98 from leonardchan/tpu7x-auto
leonardchan Feb 4, 2026
7b090f1
[Automation] Add BMM into automation script
hylin2002 Feb 4, 2026
a86475d
Add baseline pipelined flow to H2D benchmark
leonardchan Feb 5, 2026
0fe95cc
Correct fp4 tensor size calculation (#99)
linamy85 Feb 5, 2026
52ff955
Merge branch 'h2dd2h' into tpu7x-auto
leonardchan Feb 5, 2026
9a7c4af
Add baseline pipelined flow to H2D benchmark
leonardchan Feb 5, 2026
90eb07a
Add --numactl_binding flag to host_device YAMLs
leonardchan Feb 5, 2026
040002f
Add h2d_type column to H2D/D2H output
leonardchan Feb 5, 2026
38e8038
Revert "Add baseline pipelined flow to H2D benchmark"
leonardchan Feb 5, 2026
09b3331
Revert "Add --numactl_binding flag to host_device YAMLs"
leonardchan Feb 5, 2026
462d771
Revert "Add h2d_type column to H2D/D2H output"
leonardchan Feb 5, 2026
75c47aa
Add upload log for aggregated results
hylin2002 Feb 5, 2026
6eb8ac6
Update num_runs for collectives and matmul configuration
hylin2002 Feb 5, 2026
b17d35b
Set batch size in bmm configuration to be 8
hylin2002 Feb 5, 2026
de75930
Implement gemm_all_reduce benchmark (Single Chip)
linamy85 Feb 6, 2026
9a0b8ae
Add multi-host BMM into automation (#105)
hylin2002 Feb 6, 2026
2d45945
Update pipelined flow with optimized approach
leonardchan Feb 6, 2026
3d82d5b
Add missing h2d_type to H2D metrics
leonardchan Feb 6, 2026
adc084d
Revert unintended commit
linamy85 Feb 6, 2026
16c6147
Remove 32768 data_size_mib from H2D YAML
leonardchan Feb 6, 2026
8c3be37
Fix inadvertent removal of target_devices (#108)
leonardchan Feb 9, 2026
b6bd6ae
Add log to show best hyperparameters after tuning
hylin2002 Feb 10, 2026
5d73eae
Add attention into aggregator
hylin2002 Feb 10, 2026
d9d2246
Gemm+All Reduce for 4x4x4 and fix minor bugs
linamy85 Feb 6, 2026
88229ec
Add 4x4 gemm_all_reduce.yaml to automation launch script
linamy85 Feb 10, 2026
a585ecb
Add step time to matmul series
linamy85 Feb 11, 2026
adb47d1
update benchmark_attention not sweep at the runtime (#111)
yuyanpeng-google Feb 12, 2026
9bf9e38
Add attention into automation
hylin2002 Feb 12, 2026
1d36fa8
Update attention aggregate logic
hylin2002 Feb 12, 2026
5d958cc
Set automation timeout to 2 hours
hylin2002 Feb 12, 2026
1ab8008
Set attention num_runs to 20
hylin2002 Feb 12, 2026
9b4e8de
Try pinned memory
linamy85 Feb 11, 2026
e0a9abc
fix numeric error cause by padding and improve default block size (#112)
yuyanpeng-google Feb 13, 2026
e7c1649
Fix retry command
linamy85 Feb 13, 2026
c2bec50
Remove BMM multi-host runs from the 2x2x1 yaml file to avoid confusion.
chishuen Feb 13, 2026
f4f89ee
Adding CCC based autoscaler files (#109)
pulasthi Feb 19, 2026
1629d32
adding all benchmarks to automation script (#114)
pulasthi Feb 20, 2026
5885a28
Add missing 8192 gemm
linamy85 Feb 25, 2026
4a28403
Remove peak flops for fp32, which is unspecified in spec (#117)
linamy85 Mar 3, 2026
a495fd6
Increase sweeping range for all reduce
linamy85 Mar 6, 2026
c378bdb
Extend configs for gemm and collectives
linamy85 Mar 6, 2026
dc795d9
Extend configs for gemm and collectives
linamy85 Mar 6, 2026
55fa0ea
Fix collectives aggregator for multi dtypes
linamy85 Mar 6, 2026
cb56a43
Address too much event issue
linamy85 Mar 6, 2026
dd61804
Use larger transfering size
linamy85 Mar 9, 2026
f924a7e
Optimize H2D/D2H transfer pipelines and add comprehensive benchmark c…
linamy85 Mar 16, 2026
7439b2a
Add benchmark guide and run script
linamy85 Mar 16, 2026
38ec530
Allow sweeping dtype in host_device benchmarks
linamy85 May 13, 2026
aa1e67c
Added sample variance as a metric for h2dd2h and increased the num_ru…
May 20, 2026
ac83fee
Triggering CLA recheck
May 20, 2026
f302e98
Triggering CLA recheck 2
May 20, 2026
9e88b7b
shorten sample_variance as variance
May 21, 2026
c8eca6f
check if the variance is nan and set the value to zero
May 22, 2026
07fc9b3
Updated comprehensive_8dev_experiments.yaml from 20 to 100 num runs
May 25, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions Ironwood/configs/attention/attention.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
benchmarks:
- benchmark_name: "tokamax_splash_attention"
benchmark_sweep_params:
- {batch_size: 1, q_seq_len: 4096, kv_seq_len: 4096, q_heads: 128, kv_heads: 128, qk_head_dim: 256, v_head_dim: 256, mode: ["fwd", "bwd"], causal: [true, false], num_runs: 20}
trace_dir: "../microbenchmarks/attention"
csv_path: "../microbenchmarks/attention"
xlml_metrics_dir: "../microbenchmarks/attention"
75 changes: 75 additions & 0 deletions Ironwood/configs/bmm/multi_host_bmm.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
benchmarks:
- benchmark_name: "multi_host_bmm"
trace_dir: "../microbenchmarks/multi_host_bmm_bf16"
csv_path: "../microbenchmarks/multi_host_bmm_bf16"
xlml_metrics_dir: "../microbenchmarks/multi_host_bmm_bf16"
xla_dump_dir: "../microbenchmarks/multi_host_bmm_bf16/hlo_graphs"
benchmark_sweep_params:
- {b: 8, m: 128, k: 128, n: 128, num_runs: 100, dtype: 'bfloat16'}
- {b: 8, m: 256, k: 256, n: 256, num_runs: 100, dtype: 'bfloat16'}
- {b: 8, m: 512, k: 512, n: 512, num_runs: 100, dtype: 'bfloat16'}
- {b: 8, m: 1024, k: 1024, n: 1024, num_runs: 100, dtype: 'bfloat16'}
- {b: 8, m: 2048, k: 2048, n: 2048, num_runs: 100, dtype: 'bfloat16'}
- {b: 8, m: 4096, k: 4096, n: 4096, num_runs: 100, dtype: 'bfloat16'}
- {b: 8, m: 16384, k: 16384, n: 16384, num_runs: 100, dtype: 'bfloat16'}
- {b: 8, m: 32768, k: 32768, n: 32768, num_runs: 100, dtype: 'bfloat16'}

- benchmark_name: "multi_host_bmm"
trace_dir: "../microbenchmarks/multi_host_bmm_f32"
csv_path: "../microbenchmarks/multi_host_bmm_f32"
xlml_metrics_dir: "../microbenchmarks/multi_host_bmm_f32"
xla_dump_dir: "../microbenchmarks/multi_host_bmm_f32/hlo_graphs"
benchmark_sweep_params:
- {b: 8, m: 128, k: 128, n: 128, num_runs: 100, dtype: 'float32'}
- {b: 8, m: 256, k: 256, n: 256, num_runs: 100, dtype: 'float32'}
- {b: 8, m: 512, k: 512, n: 512, num_runs: 100, dtype: 'float32'}
- {b: 8, m: 1024, k: 1024, n: 1024, num_runs: 100, dtype: 'float32'}
- {b: 8, m: 2048, k: 2048, n: 2048, num_runs: 100, dtype: 'float32'}
- {b: 8, m: 4096, k: 4096, n: 4096, num_runs: 100, dtype: 'float32'}
- {b: 8, m: 16384, k: 16384, n: 16384, num_runs: 100, dtype: 'float32'}
- {b: 8, m: 32768, k: 32768, n: 32768, num_runs: 100, dtype: 'float32'}

- benchmark_name: "multi_host_bmm"
trace_dir: "../microbenchmarks/multi_host_bmm_fp16"
csv_path: "../microbenchmarks/multi_host_bmm_fp16"
xlml_metrics_dir: "../microbenchmarks/multi_host_bmm_fp16"
xla_dump_dir: "../microbenchmarks/multi_host_bmm_fp16/hlo_graphs"
benchmark_sweep_params:
- {b: 8, m: 128, k: 128, n: 128, num_runs: 100, dtype: 'float16'}
- {b: 8, m: 256, k: 256, n: 256, num_runs: 100, dtype: 'float16'}
- {b: 8, m: 512, k: 512, n: 512, num_runs: 100, dtype: 'float16'}
- {b: 8, m: 1024, k: 1024, n: 1024, num_runs: 100, dtype: 'float16'}
- {b: 8, m: 2048, k: 2048, n: 2048, num_runs: 100, dtype: 'float16'}
- {b: 8, m: 4096, k: 4096, n: 4096, num_runs: 100, dtype: 'float16'}
- {b: 8, m: 16384, k: 16384, n: 16384, num_runs: 100, dtype: 'float16'}
- {b: 8, m: 32768, k: 32768, n: 32768, num_runs: 100, dtype: 'float16'}

- benchmark_name: "multi_host_bmm"
trace_dir: "../microbenchmarks/multi_host_bmm_fp8"
csv_path: "../microbenchmarks/multi_host_bmm_fp8"
xlml_metrics_dir: "../microbenchmarks/multi_host_bmm_fp8"
xla_dump_dir: "../microbenchmarks/multi_host_bmm_fp8/hlo_graphs"
benchmark_sweep_params:
- {b: 8, m: 128, k: 128, n: 128, num_runs: 100, dtype: 'float8'}
- {b: 8, m: 256, k: 256, n: 256, num_runs: 100, dtype: 'float8'}
- {b: 8, m: 512, k: 512, n: 512, num_runs: 100, dtype: 'float8'}
- {b: 8, m: 1024, k: 1024, n: 1024, num_runs: 100, dtype: 'float8'}
- {b: 8, m: 2048, k: 2048, n: 2048, num_runs: 100, dtype: 'float8'}
- {b: 8, m: 4096, k: 4096, n: 4096, num_runs: 100, dtype: 'float8'}
- {b: 8, m: 16384, k: 16384, n: 16384, num_runs: 100, dtype: 'float8'}
- {b: 8, m: 32768, k: 32768, n: 32768, num_runs: 100, dtype: 'float8'}

- benchmark_name: "multi_host_bmm"
trace_dir: "../microbenchmarks/multi_host_bmm_fp4"
csv_path: "../microbenchmarks/multi_host_bmm_fp4"
xlml_metrics_dir: "../microbenchmarks/multi_host_bmm_fp4"
xla_dump_dir: "../microbenchmarks/multi_host_bmm_fp4/hlo_graphs"
benchmark_sweep_params:
- {b: 8, m: 128, k: 128, n: 128, num_runs: 100, dtype: 'float4'}
- {b: 8, m: 256, k: 256, n: 256, num_runs: 100, dtype: 'float4'}
- {b: 8, m: 512, k: 512, n: 512, num_runs: 100, dtype: 'float4'}
- {b: 8, m: 1024, k: 1024, n: 1024, num_runs: 100, dtype: 'float4'}
- {b: 8, m: 2048, k: 2048, n: 2048, num_runs: 100, dtype: 'float4'}
- {b: 8, m: 4096, k: 4096, n: 4096, num_runs: 100, dtype: 'float4'}
- {b: 8, m: 16384, k: 16384, n: 16384, num_runs: 100, dtype: 'float4'}
- {b: 8, m: 32768, k: 32768, n: 32768, num_runs: 100, dtype: 'float4'}
75 changes: 75 additions & 0 deletions Ironwood/configs/bmm/single_device_bmm.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
benchmarks:
- benchmark_name: "single_device_bmm"
trace_dir: "../microbenchmarks/single_device_bmm_bf16"
csv_path: "../microbenchmarks/single_device_bmm_bf16"
xlml_metrics_dir: "../microbenchmarks/single_device_bmm_bf16"
xla_dump_dir: "../microbenchmarks/single_device_bmm_bf16/hlo_graphs"
benchmark_sweep_params:
- {b: 8, m: 128, k: 128, n: 128, num_runs: 100, dtype: 'bfloat16'}
- {b: 8, m: 256, k: 256, n: 256, num_runs: 100, dtype: 'bfloat16'}
- {b: 8, m: 512, k: 512, n: 512, num_runs: 100, dtype: 'bfloat16'}
- {b: 8, m: 1024, k: 1024, n: 1024, num_runs: 100, dtype: 'bfloat16'}
- {b: 8, m: 2048, k: 2048, n: 2048, num_runs: 100, dtype: 'bfloat16'}
- {b: 8, m: 4096, k: 4096, n: 4096, num_runs: 100, dtype: 'bfloat16'}
- {b: 8, m: 16384, k: 16384, n: 16384, num_runs: 100, dtype: 'bfloat16'}
- {b: 8, m: 32768, k: 32768, n: 32768, num_runs: 100, dtype: 'bfloat16'}

- benchmark_name: "single_device_bmm"
trace_dir: "../microbenchmarks/single_device_bmm_f32"
csv_path: "../microbenchmarks/single_device_bmm_f32"
xlml_metrics_dir: "../microbenchmarks/single_device_bmm_f32"
xla_dump_dir: "../microbenchmarks/single_device_bmm_f32/hlo_graphs"
benchmark_sweep_params:
- {b: 8, m: 128, k: 128, n: 128, num_runs: 100, dtype: 'float32'}
- {b: 8, m: 256, k: 256, n: 256, num_runs: 100, dtype: 'float32'}
- {b: 8, m: 512, k: 512, n: 512, num_runs: 100, dtype: 'float32'}
- {b: 8, m: 1024, k: 1024, n: 1024, num_runs: 100, dtype: 'float32'}
- {b: 8, m: 2048, k: 2048, n: 2048, num_runs: 100, dtype: 'float32'}
- {b: 8, m: 4096, k: 4096, n: 4096, num_runs: 100, dtype: 'float32'}
- {b: 8, m: 16384, k: 16384, n: 16384, num_runs: 100, dtype: 'float32'}
- {b: 8, m: 32768, k: 32768, n: 32768, num_runs: 100, dtype: 'float32'}

- benchmark_name: "single_device_bmm"
trace_dir: "../microbenchmarks/single_device_bmm_fp16"
csv_path: "../microbenchmarks/single_device_bmm_fp16"
xlml_metrics_dir: "../microbenchmarks/single_device_bmm_fp16"
xla_dump_dir: "../microbenchmarks/single_device_bmm_fp16/hlo_graphs"
benchmark_sweep_params:
- {b: 8, m: 128, k: 128, n: 128, num_runs: 100, dtype: 'float16'}
- {b: 8, m: 256, k: 256, n: 256, num_runs: 100, dtype: 'float16'}
- {b: 8, m: 512, k: 512, n: 512, num_runs: 100, dtype: 'float16'}
- {b: 8, m: 1024, k: 1024, n: 1024, num_runs: 100, dtype: 'float16'}
- {b: 8, m: 2048, k: 2048, n: 2048, num_runs: 100, dtype: 'float16'}
- {b: 8, m: 4096, k: 4096, n: 4096, num_runs: 100, dtype: 'float16'}
- {b: 8, m: 16384, k: 16384, n: 16384, num_runs: 100, dtype: 'float16'}
- {b: 8, m: 32768, k: 32768, n: 32768, num_runs: 100, dtype: 'float16'}

- benchmark_name: "single_device_bmm"
trace_dir: "../microbenchmarks/single_device_bmm_fp8"
csv_path: "../microbenchmarks/single_device_bmm_fp8"
xlml_metrics_dir: "../microbenchmarks/single_device_bmm_fp8"
xla_dump_dir: "../microbenchmarks/single_device_bmm_fp8/hlo_graphs"
benchmark_sweep_params:
- {b: 8, m: 128, k: 128, n: 128, num_runs: 100, dtype: 'float8'}
- {b: 8, m: 256, k: 256, n: 256, num_runs: 100, dtype: 'float8'}
- {b: 8, m: 512, k: 512, n: 512, num_runs: 100, dtype: 'float8'}
- {b: 8, m: 1024, k: 1024, n: 1024, num_runs: 100, dtype: 'float8'}
- {b: 8, m: 2048, k: 2048, n: 2048, num_runs: 100, dtype: 'float8'}
- {b: 8, m: 4096, k: 4096, n: 4096, num_runs: 100, dtype: 'float8'}
- {b: 8, m: 16384, k: 16384, n: 16384, num_runs: 100, dtype: 'float8'}
- {b: 8, m: 32768, k: 32768, n: 32768, num_runs: 100, dtype: 'float8'}

- benchmark_name: "single_device_bmm"
trace_dir: "../microbenchmarks/single_device_bmm_fp4"
csv_path: "../microbenchmarks/single_device_bmm_fp4"
xlml_metrics_dir: "../microbenchmarks/single_device_bmm_fp4"
xla_dump_dir: "../microbenchmarks/single_device_bmm_fp4/hlo_graphs"
benchmark_sweep_params:
- {b: 8, m: 128, k: 128, n: 128, num_runs: 100, dtype: 'float4'}
- {b: 8, m: 256, k: 256, n: 256, num_runs: 100, dtype: 'float4'}
- {b: 8, m: 512, k: 512, n: 512, num_runs: 100, dtype: 'float4'}
- {b: 8, m: 1024, k: 1024, n: 1024, num_runs: 100, dtype: 'float4'}
- {b: 8, m: 2048, k: 2048, n: 2048, num_runs: 100, dtype: 'float4'}
- {b: 8, m: 4096, k: 4096, n: 4096, num_runs: 100, dtype: 'float4'}
- {b: 8, m: 16384, k: 16384, n: 16384, num_runs: 100, dtype: 'float4'}
- {b: 8, m: 32768, k: 32768, n: 32768, num_runs: 100, dtype: 'float4'}
4 changes: 3 additions & 1 deletion Ironwood/configs/collectives/all_gather_tpu7x_2x2x1.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
benchmarks:
- benchmark_name: all_gather
benchmark_sweep_params:
- {matrix_dim_range: {start: 8, end: 16384, multiplier: 2}, dtype: "float32", mesh_shape: "2x2x2", ici_size_range: 8, sharding_strategy: "2x2x1", op_dimension: 1, num_runs: 5}
- {matrix_dim_range: {start: 64, end: 2097152, multiplier: 2}, dtype: "float32", mesh_shape: "2x2x2", ici_size_range: 8, sharding_strategy: "2x2x1", op_dimension: 1, num_runs: 20}
- {matrix_dim_range: {start: 64, end: 2097152, multiplier: 2}, dtype: "bfloat16", mesh_shape: "2x2x2", ici_size_range: 8, sharding_strategy: "2x2x1", op_dimension: 1, num_runs: 20}
- {matrix_dim_range: {start: 64, end: 2097152, multiplier: 2}, dtype: "float8", mesh_shape: "2x2x2", ici_size_range: 8, sharding_strategy: "2x2x1", op_dimension: 1, num_runs: 20}
trace_dir: "../microbenchmarks/all_gather_tpu7x_2x2x1"
csv_path: "../microbenchmarks/all_gather_tpu7x_2x2x1"
xlml_metrics_dir: "../microbenchmarks/all_gather_tpu7x_2x2x1"
Expand Down
4 changes: 3 additions & 1 deletion Ironwood/configs/collectives/all_gather_tpu7x_2x2x2.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
benchmarks:
- benchmark_name: all_gather
benchmark_sweep_params:
- {matrix_dim_range: {start: 8, end: 16384, multiplier: 2}, dtype: "float32", mesh_shape: "4x2x2", ici_size_range: 16, sharding_strategy: "4x2x1", op_dimension: 1, num_runs: 5}
- {matrix_dim_range: {start: 64, end: 2097152, multiplier: 2}, dtype: "float32", mesh_shape: "4x2x2", ici_size_range: 16, sharding_strategy: "4x2x1", op_dimension: 1, num_runs: 20}
- {matrix_dim_range: {start: 64, end: 2097152, multiplier: 2}, dtype: "bfloat16", mesh_shape: "4x2x2", ici_size_range: 16, sharding_strategy: "4x2x1", op_dimension: 1, num_runs: 20}
- {matrix_dim_range: {start: 64, end: 2097152, multiplier: 2}, dtype: "float8", mesh_shape: "4x2x2", ici_size_range: 16, sharding_strategy: "4x2x1", op_dimension: 1, num_runs: 20}
trace_dir: "../microbenchmarks/all_gather_tpu7x_2x2x2"
csv_path: "../microbenchmarks/all_gather_tpu7x_2x2x2"
xlml_metrics_dir: "../microbenchmarks/all_gather_tpu7x_2x2x2"
Expand Down
4 changes: 3 additions & 1 deletion Ironwood/configs/collectives/all_gather_tpu7x_2x2x4.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
benchmarks:
- benchmark_name: all_gather
benchmark_sweep_params:
- {matrix_dim_range: {start: 16, end: 16384, multiplier: 2}, dtype: "float32", mesh_shape: "4x4x2", ici_size_range: 32, sharding_strategy: "4x4x1", op_dimension: 1, num_runs: 5}
- {matrix_dim_range: {start: 64, end: 2097152, multiplier: 2}, dtype: "float32", mesh_shape: "4x4x2", ici_size_range: 32, sharding_strategy: "4x4x1", op_dimension: 1, num_runs: 20}
- {matrix_dim_range: {start: 64, end: 2097152, multiplier: 2}, dtype: "bfloat16", mesh_shape: "4x4x2", ici_size_range: 32, sharding_strategy: "4x4x1", op_dimension: 1, num_runs: 20}
- {matrix_dim_range: {start: 64, end: 2097152, multiplier: 2}, dtype: "float8", mesh_shape: "4x4x2", ici_size_range: 32, sharding_strategy: "4x4x1", op_dimension: 1, num_runs: 20}
trace_dir: "../microbenchmarks/all_gather_tpu7x_2x2x4"
csv_path: "../microbenchmarks/all_gather_tpu7x_2x2x4"
xlml_metrics_dir: "../microbenchmarks/all_gather_tpu7x_2x2x4"
Expand Down
4 changes: 3 additions & 1 deletion Ironwood/configs/collectives/all_gather_tpu7x_2x4x4.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
benchmarks:
- benchmark_name: all_gather
benchmark_sweep_params:
- {matrix_dim_range: {start: 32, end: 16384, multiplier: 2}, dtype: "float32", mesh_shape: "8x4x2", ici_size_range: 64, sharding_strategy: "8x4x1", op_dimension: 1, num_runs: 5}
- {matrix_dim_range: {start: 64, end: 2097152, multiplier: 2}, dtype: "float32", mesh_shape: "8x4x2", ici_size_range: 64, sharding_strategy: "8x4x1", op_dimension: 1, num_runs: 20}
- {matrix_dim_range: {start: 64, end: 2097152, multiplier: 2}, dtype: "bfloat16", mesh_shape: "8x4x2", ici_size_range: 64, sharding_strategy: "8x4x1", op_dimension: 1, num_runs: 20}
- {matrix_dim_range: {start: 64, end: 2097152, multiplier: 2}, dtype: "float8", mesh_shape: "8x4x2", ici_size_range: 64, sharding_strategy: "8x4x1", op_dimension: 1, num_runs: 20}
trace_dir: "../microbenchmarks/all_gather_tpu7x_2x4x4"
csv_path: "../microbenchmarks/all_gather_tpu7x_2x4x4"
xlml_metrics_dir: "../microbenchmarks/all_gather_tpu7x_2x4x4"
Expand Down
4 changes: 3 additions & 1 deletion Ironwood/configs/collectives/all_gather_tpu7x_4x4x4.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
benchmarks:
- benchmark_name: all_gather
benchmark_sweep_params:
- {matrix_dim_range: {start: 64, end: 16384, multiplier: 2}, dtype: "float32", mesh_shape: "16x4x2", ici_size_range: 128, sharding_strategy: "16x4x1", op_dimension: 1, num_runs: 10}
- {matrix_dim_range: {start: 64, end: 16384, multiplier: 2}, dtype: "float32", mesh_shape: "16x4x2", ici_size_range: 128, sharding_strategy: "16x4x1", op_dimension: 1, num_runs: 20}
- {matrix_dim_range: {start: 64, end: 16384, multiplier: 2}, dtype: "bfloat16", mesh_shape: "16x4x2", ici_size_range: 128, sharding_strategy: "16x4x1", op_dimension: 1, num_runs: 20}
- {matrix_dim_range: {start: 64, end: 16384, multiplier: 2}, dtype: "float8", mesh_shape: "16x4x2", ici_size_range: 128, sharding_strategy: "16x4x1", op_dimension: 1, num_runs: 20}
trace_dir: "../microbenchmarks/all_gather_tpu7x_4x4x4"
csv_path: "../microbenchmarks/all_gather_tpu7x_4x4x4"
xlml_metrics_dir: "../microbenchmarks/all_gather_tpu7x_4x4x4"
Expand Down
4 changes: 3 additions & 1 deletion Ironwood/configs/collectives/all_gather_tpu7x_4x4x8.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
benchmarks:
- benchmark_name: all_gather
benchmark_sweep_params:
- {matrix_dim_range: {start: 128, end: 16384, multiplier: 2}, dtype: "float32", mesh_shape: "32x4x2", ici_size_range: 256, sharding_strategy: "32x4x1", op_dimension: 1, num_runs: 5}
- {matrix_dim_range: {start: 64, end: 2097152, multiplier: 2}, dtype: "float32", mesh_shape: "32x4x2", ici_size_range: 256, sharding_strategy: "32x4x1", op_dimension: 1, num_runs: 20}
- {matrix_dim_range: {start: 64, end: 2097152, multiplier: 2}, dtype: "bfloat16", mesh_shape: "32x4x2", ici_size_range: 256, sharding_strategy: "32x4x1", op_dimension: 1, num_runs: 20}
- {matrix_dim_range: {start: 64, end: 2097152, multiplier: 2}, dtype: "float8", mesh_shape: "32x4x2", ici_size_range: 256, sharding_strategy: "32x4x1", op_dimension: 1, num_runs: 20}
trace_dir: "../microbenchmarks/all_gather_tpu7x_4x4x8"
csv_path: "../microbenchmarks/all_gather_tpu7x_4x4x8"
xlml_metrics_dir: "../microbenchmarks/all_gather_tpu7x_4x4x8"
Expand Down
4 changes: 3 additions & 1 deletion Ironwood/configs/collectives/all_reduce_tpu7x_2x2x1.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
benchmarks:
- benchmark_name: psum
benchmark_sweep_params:
- {matrix_dim_range: {start: 8, end: 16384, multiplier: 2}, dtype: "float32", mesh_shape: "2x2x2", ici_size_range: 8, sharding_strategy: "2x2x1", op_dimension: 1, num_runs: 5}
- {matrix_dim_range: {start: 64, end: 2097152, multiplier: 2}, dtype: "float32", mesh_shape: "2x2x2", ici_size_range: 8, sharding_strategy: "2x2x1", op_dimension: 1, num_runs: 20}
- {matrix_dim_range: {start: 64, end: 2097152, multiplier: 2}, dtype: "bfloat16", mesh_shape: "2x2x2", ici_size_range: 8, sharding_strategy: "2x2x1", op_dimension: 1, num_runs: 20}
- {matrix_dim_range: {start: 64, end: 2097152, multiplier: 2}, dtype: "float8", mesh_shape: "2x2x2", ici_size_range: 8, sharding_strategy: "2x2x1", op_dimension: 1, num_runs: 20}
trace_dir: "../microbenchmarks/psum_tpu7x_2x2x1"
csv_path: "../microbenchmarks/psum_tpu7x_2x2x1"
xlml_metrics_dir: "../microbenchmarks/psum_tpu7x_2x2x1"
Expand Down
4 changes: 3 additions & 1 deletion Ironwood/configs/collectives/all_reduce_tpu7x_2x2x2.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
benchmarks:
- benchmark_name: psum
benchmark_sweep_params:
- {matrix_dim_range: {start: 8, end: 16384, multiplier: 2}, dtype: "float32", mesh_shape: "4x2x2", ici_size_range: 16, sharding_strategy: "4x2x1", op_dimension: 1, num_runs: 5}
- {matrix_dim_range: {start: 64, end: 2097152, multiplier: 2}, dtype: "float32", mesh_shape: "4x2x2", ici_size_range: 16, sharding_strategy: "4x2x1", op_dimension: 1, num_runs: 20}
- {matrix_dim_range: {start: 64, end: 2097152, multiplier: 2}, dtype: "bfloat16", mesh_shape: "4x2x2", ici_size_range: 16, sharding_strategy: "4x2x1", op_dimension: 1, num_runs: 20}
- {matrix_dim_range: {start: 64, end: 2097152, multiplier: 2}, dtype: "float8", mesh_shape: "4x2x2", ici_size_range: 16, sharding_strategy: "4x2x1", op_dimension: 1, num_runs: 20}
trace_dir: "../microbenchmarks/psum_tpu7x_2x2x2"
csv_path: "../microbenchmarks/psum_tpu7x_2x2x2"
xlml_metrics_dir: "../microbenchmarks/psum_tpu7x_2x2x2"
Expand Down
4 changes: 3 additions & 1 deletion Ironwood/configs/collectives/all_reduce_tpu7x_2x2x4.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
benchmarks:
- benchmark_name: psum
benchmark_sweep_params:
- {matrix_dim_range: {start: 16, end: 16384, multiplier: 2}, dtype: "float32", mesh_shape: "4x4x2", ici_size_range: 32, sharding_strategy: "4x4x1", op_dimension: 1, num_runs: 5}
- {matrix_dim_range: {start: 64, end: 2097152, multiplier: 2}, dtype: "float32", mesh_shape: "4x4x2", ici_size_range: 32, sharding_strategy: "4x4x1", op_dimension: 1, num_runs: 20}
- {matrix_dim_range: {start: 64, end: 2097152, multiplier: 2}, dtype: "bfloat16", mesh_shape: "4x4x2", ici_size_range: 32, sharding_strategy: "4x4x1", op_dimension: 1, num_runs: 20}
- {matrix_dim_range: {start: 64, end: 2097152, multiplier: 2}, dtype: "float8", mesh_shape: "4x4x2", ici_size_range: 32, sharding_strategy: "4x4x1", op_dimension: 1, num_runs: 20}
trace_dir: "../microbenchmarks/psum_tpu7x_2x2x4"
csv_path: "../microbenchmarks/psum_tpu7x_2x2x4"
xlml_metrics_dir: "../microbenchmarks/psum_tpu7x_2x2x4"
Expand Down
4 changes: 3 additions & 1 deletion Ironwood/configs/collectives/all_reduce_tpu7x_2x4x4.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
benchmarks:
- benchmark_name: psum
benchmark_sweep_params:
- {matrix_dim_range: {start: 32, end: 16384, multiplier: 2}, dtype: "float32", mesh_shape: "8x4x2", ici_size_range: 64, sharding_strategy: "8x4x1", op_dimension: 1, num_runs: 5}
- {matrix_dim_range: {start: 64, end: 2097152, multiplier: 2}, dtype: "float32", mesh_shape: "8x4x2", ici_size_range: 64, sharding_strategy: "8x4x1", op_dimension: 1, num_runs: 20}
- {matrix_dim_range: {start: 64, end: 2097152, multiplier: 2}, dtype: "bfloat16", mesh_shape: "8x4x2", ici_size_range: 64, sharding_strategy: "8x4x1", op_dimension: 1, num_runs: 20}
- {matrix_dim_range: {start: 64, end: 2097152, multiplier: 2}, dtype: "float8", mesh_shape: "8x4x2", ici_size_range: 64, sharding_strategy: "8x4x1", op_dimension: 1, num_runs: 20}
trace_dir: "../microbenchmarks/psum_tpu7x_2x4x4"
csv_path: "../microbenchmarks/psum_tpu7x_2x4x4"
xlml_metrics_dir: "../microbenchmarks/psum_tpu7x_2x4x4"
Expand Down
4 changes: 3 additions & 1 deletion Ironwood/configs/collectives/all_reduce_tpu7x_4x4x4.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
benchmarks:
- benchmark_name: psum
benchmark_sweep_params:
- {matrix_dim_range: {start: 64, end: 16384, multiplier: 2}, dtype: "float32", mesh_shape: "16x4x2", ici_size_range: 128, sharding_strategy: "16x4x1", op_dimension: 1, num_runs: 5}
- {matrix_dim_range: {start: 64, end: 2097152, multiplier: 2}, dtype: "float32", mesh_shape: "16x4x2", ici_size_range: 128, sharding_strategy: "16x4x1", op_dimension: 1, num_runs: 20}
- {matrix_dim_range: {start: 64, end: 2097152, multiplier: 2}, dtype: "bfloat16", mesh_shape: "16x4x2", ici_size_range: 128, sharding_strategy: "16x4x1", op_dimension: 1, num_runs: 20}
- {matrix_dim_range: {start: 64, end: 2097152, multiplier: 2}, dtype: "float8", mesh_shape: "16x4x2", ici_size_range: 128, sharding_strategy: "16x4x1", op_dimension: 1, num_runs: 20}
trace_dir: "../microbenchmarks/psum_tpu7x_4x4x4"
csv_path: "../microbenchmarks/psum_tpu7x_4x4x4"
xlml_metrics_dir: "../microbenchmarks/psum_tpu7x_4x4x4"
Expand Down
Loading
Loading