Skip to content

Conversation

@zzhfz
Copy link
Contributor

@zzhfz zzhfz commented Jul 24, 2025

Description

adds support for memory bandwidth testing on Hygon platform using HYQual tool

Test evidence

[root@localhost sunjinge]# ./run_all_tests.sh
[1/4]  进行 CPU 与 AI 芯片通信带宽测试...
spawn /home/sunjinge/hyqual_v3.0.3/run
Please select one of the following options:

        1. Thermal Core Qualification Test
        2. Thermal Mem Qualification Test
        3. Thermal Core and Mem Qualification Test
        4. PCIe Bandwidth Test
        5. xHCL Bandwidth Test
        6. Mem  Bandwidth Test
        7. Peak Performance Test

        h. HYQual Version
        t. Generate System Topology
        q. Quit HYQual
Key-in selection followed by <enter>:4
----------------------------------------------
Theoretical raw bandwidth = 32   GB/s
Min average raw bandwidth = 22.4 GB/s
HCU0 PCIe Device to Host:  28.212 GB/s PASS
HCU0 PCIe Host to Device:  29.030 GB/s PASS
HCU1 PCIe Device to Host:  28.214 GB/s PASS
HCU1 PCIe Host to Device:  29.030 GB/s PASS
HCU2 PCIe Device to Host:  28.214 GB/s PASS
HCU2 PCIe Host to Device:  29.030 GB/s PASS
HCU3 PCIe Device to Host:  28.213 GB/s PASS
HCU3 PCIe Host to Device:  29.031 GB/s PASS
HCU4 PCIe Device to Host:  28.214 GB/s PASS
HCU4 PCIe Host to Device:  29.030 GB/s PASS
HCU5 PCIe Device to Host:  28.213 GB/s PASS
HCU5 PCIe Host to Device:  29.030 GB/s PASS
HCU6 PCIe Device to Host:  28.214 GB/s PASS
HCU6 PCIe Host to Device:  29.030 GB/s PASS
HCU7 PCIe Device to Host:  28.211 GB/s PASS
HCU7 PCIe Host to Device:  29.030 GB/s PASS
[2/4]  进行 AI 芯片间通信带宽测试...
..............................................................
          Bidirectional copy peak bandwidth GB/s

          D/D       0           1           2           3           4           5           6           7           8           9           10          11          12          13          14          15

          0         N/A         N/A         N/A         N/A         N/A         N/A         N/A         N/A         37.712      37.691      37.685      37.691      37.675      37.702      37.695      37.707

          1         N/A         N/A         N/A         N/A         N/A         N/A         N/A         N/A         37.661      37.619      37.732      37.668      37.673      37.736      37.737      37.703

          2         N/A         N/A         N/A         N/A         N/A         N/A         N/A         N/A         37.629      37.617      37.626      37.627      37.627      37.624      37.617      37.617

          3         N/A         N/A         N/A         N/A         N/A         N/A         N/A         N/A         37.159      37.161      37.164      37.142      37.164      37.175      37.160      37.159

          4         N/A         N/A         N/A         N/A         N/A         N/A         N/A         N/A         30.056      30.057      30.065      30.078      30.625      30.623      30.627      30.580

          5         N/A         N/A         N/A         N/A         N/A         N/A         N/A         N/A         31.196      31.229      31.222      31.203      31.611      31.629      31.618      31.616

          6         N/A         N/A         N/A         N/A         N/A         N/A         N/A         N/A         30.771      30.739      30.749      30.747      31.069      31.076      31.079      31.076

          7         N/A         N/A         N/A         N/A         N/A         N/A         N/A         N/A         30.059      30.085      30.058      30.060      30.193      30.209      30.223      30.200

          8         37.712      37.661      37.629      37.159      30.056      31.196      30.771      30.059      N/A         55.534      55.546      55.395      55.522      55.583      55.403      55.433

          9         37.691      37.619      37.617      37.161      30.057      31.229      30.739      30.085      55.534      N/A         55.552      55.399      55.540      55.594      55.426      55.490

          10        37.685      37.732      37.626      37.164      30.065      31.222      30.749      30.058      55.546      55.552      N/A         55.398      55.528      55.580      55.457      55.497

          11        37.691      37.668      37.627      37.142      30.078      31.203      30.747      30.060      55.395      55.399      55.398      N/A         55.418      55.414      55.385      55.436

          12        37.675      37.673      37.627      37.164      30.625      31.611      31.069      30.193      55.522      55.540      55.528      55.418      N/A         55.636      55.469      55.516

          13        37.702      37.736      37.624      37.175      30.623      31.629      31.076      30.209      55.583      55.594      55.580      55.414      55.636      N/A         55.471      55.535

          14        37.695      37.737      37.617      37.160      30.627      31.618      31.079      30.223      55.403      55.426      55.457      55.385      55.469      55.471      N/A         55.471

          15        37.707      37.703      37.617      37.159      30.580      31.616      31.076      30.200      55.433      55.490      55.497      55.436      55.516      55.535      55.471      N/A

[3/4] 进行 CUDA 代码兼容性测试...
Launch params (32, 32, 1) are larger than launch bounds (256) for kernel _Z15matrixMulKernelPfPKfS1_iiii please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program !
Running matrix multiplication (M=1024, N=1024, K=1024)...
MatrixMul time: 1.789 ms, Throughput: 1200.09 GFLOPS
Verifying matrix multiplication...
PASS: 0 errors detected

Running matrix transpose...

Running matrix reduction sum...
Matrix sum: 524285.3125
Launch params (32, 32, 1) are larger than launch bounds (256) for kernel _Z15matrixMulKernelPfPKfS1_iiii please add __launch_bounds__ to kernel define or use --gpu-max-threads-per-block recompile program !
Running matrix multiplication (M=1024, N=1024, K=1024)...
Time: 1.694 ms, ThroughpuThroughput: 1267.63 GFLOPS
Verifying result...
PASS: 0 errors detected
[4/4]  进行内存带宽测试...
spawn bash /home/hyqual_v2.2.7/run
Please select one of the following options:

        1. Thermal Core Qualification Test
        2. Thermal Mem Qualification Test
        3. Thermal Core and Mem Qualification Test
        4. PCIe Bandwidth Test
        5. xHMI Bandwidth Test
        6. Graphic Mem Bandwidth Test
        7. Peak Performance Test

        h. HYQual Version
        t. Generate System Topology
        q. Quit HYQual
Key-in selection followed by <enter>:6
... ... ... .... 
Function    MBytes/sec  Min (sec)   Max         Average     Rate(0.0)
Copy        728755.715  0.00687     0.00963     0.00691
Mul         728592.095  0.00687     0.00976     0.00691
Add         661181.369  0.01136     0.01425     0.01142
Triad       663834.435  0.01133     0.01359     0.01137
Dot         189182.679  0.02605     0.02955     0.02660
Write       719861.736  0.01032     0.01422     0.01049
Read        835334.538  0.00599     0.00899     0.00603
Theoretical raw bandwidth = 896  GB/s
Min average raw bandwidth = 627.2 GB/s
DCU0 graphic mem bandwidth      728.615GB/s     PASS
DCU1 graphic mem bandwidth      727.600GB/s     PASS
DCU2 graphic mem bandwidth      730.172GB/s     PASS
DCU3 graphic mem bandwidth      729.602GB/s     PASS
DCU4 graphic mem bandwidth      729.478GB/s     PASS
DCU5 graphic mem bandwidth      728.998GB/s     PASS
DCU6 graphic mem bandwidth      728.756GB/s     PASS
DCU7 graphic mem bandwidth      728.578GB/s     PASS
graphic mem bandwidth test end
所有测试完成。日志保存在:/home/sunjinge/logs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants