uadk supports heterogeneous computing#658
uadk supports heterogeneous computing#658Liulongfang wants to merge 201 commits intoLinaro:developfrom
Conversation
uadk: add some bugfix
Add the algorithm hmac(sm3)-cbc(sm4) to the nosva scene, the following fileds of the session setup need to be set, the calg(WCRYPTO_CIPHER_SM4), the cmode(WCRYPTO_CIPHER_CBC), the dalg(WCRYPTO_SM3) and the dmode(WCRYPTO_DIGEST_HMAC). Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com> Signed-off-by: Qi Tao <taoqi10@huawei.com>
Currently, the algorithm name of the aead cbc mode is designed only for sha256, but it is not suitable any more when other algorithms are added, such as hmac(sm3)-cbc(aes). Now a common name is used, authenc(generic,cbc(aes)), the actual algorithm and mode are still specified by dalg and dmode in the session setup. Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com> Signed-off-by: Qi Tao <taoqi10@huawei.com>
In stream processing encryption mode, a long file needs to be encrypted. When the accelerator is invoked, the encryption result of each block is assembled. The assembled result is the same as the result of encrypting the entire file at a time. For hisi_sec, the AAD is filled to the first message, plaintext are done with the middle and the end message. In an encrypted stream, the first and the end message are unique and must be delivered to hardware. Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com> Signed-off-by: Qi Tao <taoqi10@huawei.com>
For the gcm stream mode, assoc bytes should not be 0, check it to avoid hardware error. Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com> Signed-off-by: Qi Tao <taoqi10@huawei.com>
The hardware only uses the block mode, so set the aead message state to the block mode first. Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com> Signed-off-by: Qi Tao <taoqi10@huawei.com>
The hardware supports only 16-byte alignment for the aead middle messages, the invalid length check is added now. Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com> Signed-off-by: Qi Tao <taoqi10@huawei.com>
|
有单侧ce的数据么 |
|
单侧ce的数据如下: SM3 1024B CE Performance(MB/s) |
|
硬件性能偏低,可有测过 --thread 8 --ctxnum 8? |
In common digest stream mode, io_bytes and iv_bytes need to be set to 0 when the final bd is calculated. Therefore, in the appending tag scenario, need to restore the values of io_bytes and iv_bytes to the values before they are set to 0. Therefore, the hardware can compute the overall hash value of the appending packet and the previously calculated packet, and reduce the repeated calculation. Signed-off-by: Qi Tao <taoqi10@huawei.com>
uadk: support appending tag for digest stream model
6bb4cdb to
d78e979
Compare
|
无法完全达到1+1 > 2的情况,只能是1+1 ≈ 2。也就是CPU使用率没有增加情况下,通过软算硬算的混合计算,强化业务性能,让综合性能尽可能的发挥出所有计算设备的算力: SM4算法,8KB业务包长,分别测试硬算,软算,混合计算的性能,以及达成情况(混合算力/(硬算算力 + 软算算力)) async mode: SM3算法,8KB业务包长,分别测试硬算,软算,混合计算的性能,以及达成情况(混合算力/(硬算算力 + 软算算力)) async mode: |
uadk: support aead stream mode and sm4-sm3 alg
When a combined algorithm is used, the authsize should not be 0, so add check for it. Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com> Signed-off-by: Qi Tao <taoqi10@huawei.com>
According to the HMAC rfc, the auth key could be 0 bytes, so remove the wrong judgment. Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com> Signed-off-by: Qi Tao <taoqi10@huawei.com>
According to the HMAC rfc, the auth key could be 0 bytes, so remove the wrong judgment. Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com> Signed-off-by: Qi Tao <taoqi10@huawei.com>
The auth key could be 0 bytes, remove the wrong judgment. Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com> Signed-off-by: Qi Tao <taoqi10@huawei.com>
The ctx key may be null if the user use the normal mode, it should return an error before copy data to the key. Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com> Signed-off-by: Qi Tao <taoqi10@huawei.com>
First, move the algorithm check to the right level, then we modified the alignment to 4 bytes from 16 bytes according to the hardware specification. Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com> Signed-off-by: Qi Tao <taoqi10@huawei.com>
The alignment of authsize should be 4 bytes not 16 bytes according to the hardware specification. Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com> Signed-off-by: Qi Tao <taoqi10@huawei.com>
Add print help when dfx/benchmark/test input empty parameters. Signed-off-by: Junchong Pan <panjunchong@h-partners.com> Signed-off-by: Qi Tao <taoqi10@huawei.com>
When soft computing is required, an invalid BD is used to ensure the integrity of the sending and receiving process, it is more efficient. Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com> Signed-off-by: Qi Tao <taoqi10@huawei.com>
uadk: Fix static analysis warning
The original timer in uadk_tool requires re-triggering upon expiration, leading to nested timing and potential inaccuracy. This update improves the timer mechanism. Additionally, the random number generator used a fixed seed during initialization, resulting in insufficient randomness; this has been updated. Furthermore, for non-aligned random length values, the generated result could be empty—this issue has also been fixed. Signed-off-by: Longfang Liu <liulongfang@huawei.com>
uadk_tool: update component functionality in uadk_tool
Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
uadk: Add CI configuration
When ci fails, it is very difficult to reproduce. Add set -x in build script, to make it is easier to find which cmd fails. Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
sanity_test: add set -x to print cmd
Ignore zip test since the zip tool is not built if OpenSSL 3.0 Refer uadk_tool/Makefile.am if HAVE_CRYPTO uadk_tool_SOURCES+=test/comp_main.c endif Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org> Signed-off-by: Weili Qian <qianweili@huawei.com>
uadk: sanity_test ignore zip if OpenSSL 3.0
Release 2.10 in 2025.12 Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org> Signed-off-by: Longfang Liu <liulongfang@huawei.com>
uadk: release 2.10
When looking up the corresponding driver by algorithm type, since the driver does not save the algorithm type, it cannot be directly obtained. Therefore, the algorithm type should be saved during algorithm registration. Signed-off-by: Weili Qian <qianweili@huawei.com>
uadk adds API to support obtaining the current bandwidth utilization of a device. When the device driver creates the "dev_usage" file, users can obtain the current bandwidth utilization of the specified algorithm on the device by passing in the device and algorithm name to be queried. Signed-off-by: Weili Qian <qianweili@huawei.com>
uadk supports obtaining the bandwidth utilization of specified devices and algorithms through user-space drivers. After hardware resources are initialized, the bandwidth utilization can be directly obtained through the hardware mmio space, replacing the method of reading sysfs files and reducing system calls. Signed-off-by: Weili Qian <qianweili@huawei.com>
Supports obtaining the device's bandwidth utilization. For usage details, refer to "uadk_tool dfx --help". Signed-off-by: Weili Qian <qianweili@huawei.com>
uadk: support querying device bandwidth utilization
Due to changes in chip specifications, the hash agg 8B and 16B operations have been reduced from 9 columns to 8 columns,requiring the driver to be adapted accordingly. Signed-off-by: Zhushuai Yin <yinzhushuai@huawei.com>
uadk: hash agg 8B type of the adaptation chip supports 8 columns
Adjusted the rehash descriptors counta_vld, agg_col_bit_map, Agg_Oid, Agg_Out_Type, Col_Data_Type, and Col_Data_Info. These descriptors are consistent with those generated by the hash aggregation task. In addition, an extra 4 bytes are added when calculating the row size to ensure that each hash table contains 4 bytes of empty information. Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com> Signed-off-by: Zhushuai Yin <yinzhushuai@huawei.com>
Add error warning when CRC errors occur;When using the same ctx, the context data of the previous service flow that has ended needs to be cleared;An error message is added to report related information to zip module;The minimum output length of the lz77_zstd_price algorithm should be 4096+16+800+insize. Signed-off-by: Zongyu Wu <wuzongyu1@huawei.com> Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com>
When the sgl pool is busy, the hisi_qm_get_hw_sgl function returns an error, causing the operation to fail. Now, this function returns the code -WD_EBUSY to inform the user to wait until the sgl pool is available again. Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com>
1.In the original approach, using sched_getcpu() followed by numa_node_of_cpu() requires two system calls, resulting in low efficiency.By adopting the new getcpu() method, only one system call is needed, and in some cases, the information can even be directly obtained from process data without any system call. 2.Use getcpu() to directly obtain the node id,instead of first obtaining the cpu id and then the node id, to reduce the number of system calls. Signed-off-by: Longfang Liu <liulongfang@huawei.com> Signed-off-by: Weili Qian <qianweili@huawei.com>
Set the fd for soft ctx to avoid requesting reserved memory. Signed-off-by: Weili Qian <qianweili@huawei.com>
uadk: add the empty size for the hash table row size
Fix the compilation failure of wd_alg.h, error log likes: wd_alg.h:121:9: error: unknown type name ‘__u8’. And improve code portability by including linux/types.h instead of asm/types.h. Signed-off-by: Weili Qian <qianweili@huawei.com>
uadk: fix the compilation failure of wd_alg.h
Update the README document of the UADK project to make it more concise and understandable.
to ensure the clarity and completeness of the README document, it is necessary to reformat it into markdown type and refine its content Signed-off-by: Liulongfang <liulongfang@huawei.com>
use both hardware acceleration and instruction acceleration. It is used to use instructions to continue to improve
and accelerate business performance after the hardware business is full. And it can automatically adapt to a variety
of acceleration devices.
The current patchset was developed for this purpose. And it has been fully adapted to all algorithm types of uadk.
hybrid acceleration is significantly higher, and the acceleration effect has been significantly improved.
sm3 test cmd:
numactl --cpunodebind=0 --membind=0 uadk_tool benchmark --alg sm3 --mode sva --opt 0 --sync --pktlen 1024 --seconds 10 --thread 1 --multi 1 --ctxnum 1 --prefetch
numactl --cpunodebind=0 --membind=0 uadk_tool benchmark --alg sm3 --mode sva --opt 0 --sync --pktlen 1024 --seconds 10 --thread 1 --multi 1 --ctxnum 1 --prefetch --init2
tds------init1(HW)-----init2(HW + CE)----increase
1-----------393.3--------437.1-------------11.14%
2----------762.1---------823.4------------8.04%
4----------1508.4-------1564.1------------3.69%
8----------3007.4------3074.9-----------2.24%
16---------4851.8-------5429.2-----------11.90%
32--------4854.1-------8698.8------------79.21%
sm4 test cmd:
numactl --cpunodebind=0 --membind=0 uadk_tool benchmark --alg sm4-128-ecb --mode sva --opt 0 --sync --pktlen 1024 --seconds 10 --thread 1 --multi 1 --ctxnum 1 --prefetch
numactl --cpunodebind=0 --membind=0 uadk_tool benchmark --alg sm4-128-ecb --mode sva --opt 0 --sync --pktlen 1024 --seconds 10 --thread 1 --multi 1 --ctxnum 1 --prefetch --init2
numactl --cpunodebind=0 --membind=0 uadk_tool benchmark --alg sm4-128-ecb --mode sva --opt 0 --async --pktlen 1024 --seconds 10 --thread 1 --multi 1 --ctxnum 1 --prefetch --init2
tds-------init1(HW)----init2(HW + CE)---------increase
1-------------461----------1482.5---------------221.58%
2------------914----------2575.4---------------181.77%
4-----------1699.9--------4737.6---------------178.70%
8-----------3301.5--------7327.8---------------121.95%
16----------5837.5--------9737.4---------------66.81%
32----------8897.7-------10432.4--------------17.25%
tds-------init1(HW)----init2(HW + CE)---------increase
1-----------1368.3--------1683.9---------------23.07%
2------------2652---------3235.5---------------22.00%
4-----------3979.5--------5094.5---------------28.02%
8-----------6667.7---------8587----------------28.79%
16----------8900.9-------11067.8---------------24.34%
32----------8905.9-------10209.1--------------14.63%