Describe your performance question
As shown in the figure, when running ./transfer_engine_bench on two H20 servers, our measured throughput is 186.55 GB/s, but your result is about 65 GB/s.
The experimental details are as follows:
export MC_NUM_QP_PER_EP=1
./transfer_engine_bench --metadata_server=etcd://29.209.106.219:2379 --mode=initiator --segment_id=abcd:12345 --operation=read --device_name=mlx5_bond_1,mlx5_bond_2,mlx5_bond_3,mlx5_bond_4,mlx5_bond_5,mlx5_bond_6,mlx5_bond_7,mlx5_bond_8 --threads=12 --batch_size=128 --block_size=65536
./transfer_engine_bench --mode=target --metadata_server=etcd://29.209.106.219:2379 --local_server_name=abcd:12345 --operation=read --device_name=mlx5_bond_1,mlx5_bond_2,mlx5_bond_3,mlx5_bond_4,mlx5_bond_5,mlx5_bond_6,mlx5_bond_7,mlx5_bond_8 --threads=12 --block_size=65536
Before submitting a new issue...