uadk supports heterogeneous computing by Liulongfang · Pull Request #658 · Linaro/uadk

Liulongfang · 2025-01-03T06:33:04Z

    After uadk supports hardware acceleration and instruction acceleration functions. Users expect to be able to

use both hardware acceleration and instruction acceleration. It is used to use instructions to continue to improve
and accelerate business performance after the hardware business is full. And it can automatically adapt to a variety
of acceleration devices.
The current patchset was developed for this purpose. And it has been fully adapted to all algorithm types of uadk.

   When using the updated framework, compared with separate hardware acceleration, the performance of

hybrid acceleration is significantly higher, and the acceleration effect has been significantly improved.

sm3 test cmd:
numactl --cpunodebind=0 --membind=0 uadk_tool benchmark --alg sm3 --mode sva --opt 0 --sync --pktlen 1024 --seconds 10 --thread 1 --multi 1 --ctxnum 1 --prefetch
numactl --cpunodebind=0 --membind=0 uadk_tool benchmark --alg sm3 --mode sva --opt 0 --sync --pktlen 1024 --seconds 10 --thread 1 --multi 1 --ctxnum 1 --prefetch --init2

SM3 1024B Performance(MB/s)

tds------init1(HW)-----init2(HW + CE)----increase
1-----------393.3--------437.1-------------11.14%
2----------762.1---------823.4------------8.04%
4----------1508.4-------1564.1------------3.69%
8----------3007.4------3074.9-----------2.24%
16---------4851.8-------5429.2-----------11.90%
32--------4854.1-------8698.8------------79.21%

sm4 test cmd:
numactl --cpunodebind=0 --membind=0 uadk_tool benchmark --alg sm4-128-ecb --mode sva --opt 0 --sync --pktlen 1024 --seconds 10 --thread 1 --multi 1 --ctxnum 1 --prefetch
numactl --cpunodebind=0 --membind=0 uadk_tool benchmark --alg sm4-128-ecb --mode sva --opt 0 --sync --pktlen 1024 --seconds 10 --thread 1 --multi 1 --ctxnum 1 --prefetch --init2
numactl --cpunodebind=0 --membind=0 uadk_tool benchmark --alg sm4-128-ecb --mode sva --opt 0 --async --pktlen 1024 --seconds 10 --thread 1 --multi 1 --ctxnum 1 --prefetch --init2

SM4 1024B Performance(MB/s)

tds-------init1(HW)----init2(HW + CE)---------increase
1-------------461----------1482.5---------------221.58%
2------------914----------2575.4---------------181.77%
4-----------1699.9--------4737.6---------------178.70%
8-----------3301.5--------7327.8---------------121.95%
16----------5837.5--------9737.4---------------66.81%
32----------8897.7-------10432.4--------------17.25%

SM4 1024B async Performance(MB/s)

tds-------init1(HW)----init2(HW + CE)---------increase
1-----------1368.3--------1683.9---------------23.07%
2------------2652---------3235.5---------------22.00%
4-----------3979.5--------5094.5---------------28.02%
8-----------6667.7---------8587----------------28.79%
16----------8900.9-------11067.8---------------24.34%
32----------8905.9-------10209.1--------------14.63%

uadk: add some bugfix

Add the algorithm hmac(sm3)-cbc(sm4) to the nosva scene, the following fileds of the session setup need to be set, the calg(WCRYPTO_CIPHER_SM4), the cmode(WCRYPTO_CIPHER_CBC), the dalg(WCRYPTO_SM3) and the dmode(WCRYPTO_DIGEST_HMAC). Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com> Signed-off-by: Qi Tao <taoqi10@huawei.com>

Currently, the algorithm name of the aead cbc mode is designed only for sha256, but it is not suitable any more when other algorithms are added, such as hmac(sm3)-cbc(aes). Now a common name is used, authenc(generic,cbc(aes)), the actual algorithm and mode are still specified by dalg and dmode in the session setup. Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com> Signed-off-by: Qi Tao <taoqi10@huawei.com>

In stream processing encryption mode, a long file needs to be encrypted. When the accelerator is invoked, the encryption result of each block is assembled. The assembled result is the same as the result of encrypting the entire file at a time. For hisi_sec, the AAD is filled to the first message, plaintext are done with the middle and the end message. In an encrypted stream, the first and the end message are unique and must be delivered to hardware. Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com> Signed-off-by: Qi Tao <taoqi10@huawei.com>

For the gcm stream mode, assoc bytes should not be 0, check it to avoid hardware error. Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com> Signed-off-by: Qi Tao <taoqi10@huawei.com>

The hardware only uses the block mode, so set the aead message state to the block mode first. Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com> Signed-off-by: Qi Tao <taoqi10@huawei.com>

The hardware supports only 16-byte alignment for the aead middle messages, the invalid length check is added now. Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com> Signed-off-by: Qi Tao <taoqi10@huawei.com>

gaozhangfei · 2025-01-03T06:53:50Z

有单侧ce的数据么
还有测试命令，要是方便也贴下

Liulongfang · 2025-01-03T07:03:01Z

单侧ce的数据如下：
SM4 1024B CE Performance(MB/s)
tds-------init1(CE)
1-----------2955.9
2-----------3446.6
4-----------5774.3
8-----------8399.2
16----------10035.8
32----------10638.9

SM3 1024B CE Performance(MB/s)
tds-------init1(CE)
1-----------436.2
2-----------824.9
4-----------1571.6
8-----------3107.2
16----------5571.2
32----------9071.6

gaozhangfei · 2025-01-06T01:23:31Z

硬件性能偏低，可有测过 --thread 8 --ctxnum 8？
有1+1>2的情形么
可以选择是否打开调度吧。

In common digest stream mode, io_bytes and iv_bytes need to be set to 0 when the final bd is calculated. Therefore, in the appending tag scenario, need to restore the values of io_bytes and iv_bytes to the values before they are set to 0. Therefore, the hardware can compute the overall hash value of the appending packet and the previously calculated packet, and reduce the repeated calculation. Signed-off-by: Qi Tao <taoqi10@huawei.com>

uadk: support appending tag for digest stream model

Liulongfang · 2025-01-13T06:14:28Z

无法完全达到1+1 > 2的情况，只能是1+1 ≈ 2。也就是CPU使用率没有增加情况下，通过软算硬算的混合计算，强化业务性能，让综合性能尽可能的发挥出所有计算设备的算力：

SM4算法，8KB业务包长，分别测试硬算，软算，混合计算的性能，以及达成情况(混合算力/（硬算算力 + 软算算力）)
sync mode:
tds-----------HW------------CE-----------(HW+CE)-----achievement rate
1-----------1417.1---------5299.5----------3629.4-------54.04%
2------------2817----------7439.3----------6175.3-------60.21%
4-----------5438.4---------9680.8----------9854.2-------65.18%
8-----------9032.7--------11140.2---------11701.2------58.00%
16----------9143.3--------11837.1---------12495.5------59.56%
32----------9128.6--------12115.2---------13709.7------64.54%

async mode:
tds-----------HW------------CE-----------(HW+CE)-----achievement rate
1-----------9113.1---------5372.8----------7837.7-------54.11%
2-----------9139.1---------7211.1---------11365.7-------69.51%
4-----------9132.6---------9750.4---------13306.6-------70.47%
8-----------9144.1--------11145.6---------13948.9-------68.75%
16----------9139.3--------11727.8---------14644.3-------70.18%
32----------9124.8--------11951-----------13959.6-------66.24%

SM3算法，8KB业务包长，分别测试硬算，软算，混合计算的性能，以及达成情况(混合算力/（硬算算力 + 软算算力）)
sync mode:
tds-----------HW------------CE-----------(HW+CE)-----achievement rate
1------------962.2----------508.7-----------549.9--------37.39%
2-----------1905.4----------998.9----------1094.2-------37.68%
4-----------3810.9---------2000.1----------2163.1-------37.22%
8-----------5161.1---------3989.5----------4305.7-------47.05%
16----------5161.1---------7606.1----------8107.1-------63.50%
32----------5161.1--------13482.8---------14493.8------77.74%

async mode:
tds-----------HW------------CE-----------(HW+CE)-----achievement rate
1-----------5161.2----------508.5----------1419.7-------25.04%
2-----------5161.2---------1005.7----------2046.6-------33.19%
4-----------5161.1---------2014.1----------5683.1-------79.20%
8-----------5161.0---------4001.3----------8801.7-------96.06%
16----------5159.2---------7529.5---------12098.6-------95.35%
32----------5160.8--------12587.8---------17534.1-------98.79%

uadk: support aead stream mode and sm4-sm3 alg

When a combined algorithm is used, the authsize should not be 0, so add check for it. Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com> Signed-off-by: Qi Tao <taoqi10@huawei.com>

According to the HMAC rfc, the auth key could be 0 bytes, so remove the wrong judgment. Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com> Signed-off-by: Qi Tao <taoqi10@huawei.com>

The auth key could be 0 bytes, remove the wrong judgment. Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com> Signed-off-by: Qi Tao <taoqi10@huawei.com>

The ctx key may be null if the user use the normal mode, it should return an error before copy data to the key. Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com> Signed-off-by: Qi Tao <taoqi10@huawei.com>

First, move the algorithm check to the right level, then we modified the alignment to 4 bytes from 16 bytes according to the hardware specification. Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com> Signed-off-by: Qi Tao <taoqi10@huawei.com>

The alignment of authsize should be 4 bytes not 16 bytes according to the hardware specification. Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com> Signed-off-by: Qi Tao <taoqi10@huawei.com>

Add print help when dfx/benchmark/test input empty parameters. Signed-off-by: Junchong Pan <panjunchong@h-partners.com> Signed-off-by: Qi Tao <taoqi10@huawei.com>

When soft computing is required, an invalid BD is used to ensure the integrity of the sending and receiving process, it is more efficient. Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com> Signed-off-by: Qi Tao <taoqi10@huawei.com>

uadk: Fix static analysis warning

The original timer in uadk_tool requires re-triggering upon expiration, leading to nested timing and potential inaccuracy. This update improves the timer mechanism. Additionally, the random number generator used a fixed seed during initialization, resulting in insufficient randomness; this has been updated. Furthermore, for non-aligned random length values, the generated result could be empty—this issue has also been fixed. Signed-off-by: Longfang Liu <liulongfang@huawei.com>

uadk_tool: update component functionality in uadk_tool

Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>

uadk: Add CI configuration

When ci fails, it is very difficult to reproduce. Add set -x in build script, to make it is easier to find which cmd fails. Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>

sanity_test: add set -x to print cmd

Ignore zip test since the zip tool is not built if OpenSSL 3.0 Refer uadk_tool/Makefile.am if HAVE_CRYPTO uadk_tool_SOURCES+=test/comp_main.c endif Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org> Signed-off-by: Weili Qian <qianweili@huawei.com>

uadk: sanity_test ignore zip if OpenSSL 3.0

Release 2.10 in 2025.12 Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org> Signed-off-by: Longfang Liu <liulongfang@huawei.com>

uadk: release 2.10

When looking up the corresponding driver by algorithm type, since the driver does not save the algorithm type, it cannot be directly obtained. Therefore, the algorithm type should be saved during algorithm registration. Signed-off-by: Weili Qian <qianweili@huawei.com>

uadk adds API to support obtaining the current bandwidth utilization of a device. When the device driver creates the "dev_usage" file, users can obtain the current bandwidth utilization of the specified algorithm on the device by passing in the device and algorithm name to be queried. Signed-off-by: Weili Qian <qianweili@huawei.com>

uadk supports obtaining the bandwidth utilization of specified devices and algorithms through user-space drivers. After hardware resources are initialized, the bandwidth utilization can be directly obtained through the hardware mmio space, replacing the method of reading sysfs files and reducing system calls. Signed-off-by: Weili Qian <qianweili@huawei.com>

Supports obtaining the device's bandwidth utilization. For usage details, refer to "uadk_tool dfx --help". Signed-off-by: Weili Qian <qianweili@huawei.com>

uadk: support querying device bandwidth utilization

Due to changes in chip specifications, the hash agg 8B and 16B operations have been reduced from 9 columns to 8 columns,requiring the driver to be adapted accordingly. Signed-off-by: Zhushuai Yin <yinzhushuai@huawei.com>

uadk: hash agg 8B type of the adaptation chip supports 8 columns

Adjusted the rehash descriptors counta_vld, agg_col_bit_map, Agg_Oid, Agg_Out_Type, Col_Data_Type, and Col_Data_Info. These descriptors are consistent with those generated by the hash aggregation task. In addition, an extra 4 bytes are added when calculating the row size to ensure that each hash table contains 4 bytes of empty information. Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com> Signed-off-by: Zhushuai Yin <yinzhushuai@huawei.com>

Add error warning when CRC errors occur;When using the same ctx, the context data of the previous service flow that has ended needs to be cleared;An error message is added to report related information to zip module;The minimum output length of the lz77_zstd_price algorithm should be 4096+16+800+insize. Signed-off-by: Zongyu Wu <wuzongyu1@huawei.com> Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com>

When the sgl pool is busy, the hisi_qm_get_hw_sgl function returns an error, causing the operation to fail. Now, this function returns the code -WD_EBUSY to inform the user to wait until the sgl pool is available again. Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com>

1.In the original approach, using sched_getcpu() followed by numa_node_of_cpu() requires two system calls, resulting in low efficiency.By adopting the new getcpu() method, only one system call is needed, and in some cases, the information can even be directly obtained from process data without any system call. 2.Use getcpu() to directly obtain the node id,instead of first obtaining the cpu id and then the node id, to reduce the number of system calls. Signed-off-by: Longfang Liu <liulongfang@huawei.com> Signed-off-by: Weili Qian <qianweili@huawei.com>

Set the fd for soft ctx to avoid requesting reserved memory. Signed-off-by: Weili Qian <qianweili@huawei.com>

uadk: add the empty size for the hash table row size

Fix the compilation failure of wd_alg.h, error log likes: wd_alg.h:121:9: error: unknown type name ‘__u8’. And improve code portability by including linux/types.h instead of asm/types.h. Signed-off-by: Weili Qian <qianweili@huawei.com>

uadk: fix the compilation failure of wd_alg.h

Update the README document of the UADK project to make it more concise and understandable.

to ensure the clarity and completeness of the README document, it is necessary to reformat it into markdown type and refine its content Signed-off-by: Liulongfang <liulongfang@huawei.com>

Liulongfang and others added 7 commits December 30, 2024 12:02

Merge pull request Linaro#652 from lin755/master

7770226

uadk: add some bugfix

uadk/v1: add assoc bytes check

4847bac

For the gcm stream mode, assoc bytes should not be 0, check it to avoid hardware error. Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com> Signed-off-by: Qi Tao <taoqi10@huawei.com>

uadk/v1: set aead msg state for the hardware v2

7b2738e

The hardware only uses the block mode, so set the aead message state to the block mode first. Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com> Signed-off-by: Qi Tao <taoqi10@huawei.com>

uadk/v1: fix input length check for aead stream mode

9fb90e5

The hardware supports only 16-byte alignment for the aead middle messages, the invalid length check is added now. Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com> Signed-off-by: Qi Tao <taoqi10@huawei.com>

Liulongfang requested review from gaozhangfei, haofang111 and hzhuang1 January 3, 2025 06:33

Liulongfang force-pushed the master branch from 72c779c to 2747b9d Compare January 3, 2025 06:49

Qi Tao and others added 2 commits January 9, 2025 14:42

Merge pull request Linaro#661 from tq444/master

74ddc5d

uadk: support appending tag for digest stream model

Liulongfang force-pushed the master branch 3 times, most recently from 6bb4cdb to d78e979 Compare January 13, 2025 03:52

Liulongfang and others added 10 commits January 13, 2025 14:50

Merge pull request Linaro#657 from lin755/master

cecad35

uadk: support aead stream mode and sm4-sm3 alg

uadk: fix for aead authsize check

e030c07

When a combined algorithm is used, the authsize should not be 0, so add check for it. Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com> Signed-off-by: Qi Tao <taoqi10@huawei.com>

uadk: fix for aead auth key length

decf12f

According to the HMAC rfc, the auth key could be 0 bytes, so remove the wrong judgment. Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com> Signed-off-by: Qi Tao <taoqi10@huawei.com>

uadk: fix for digest auth key length

b8a011a

According to the HMAC rfc, the auth key could be 0 bytes, so remove the wrong judgment. Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com> Signed-off-by: Qi Tao <taoqi10@huawei.com>

uadk: fix for ce key_len check

55ce075

The auth key could be 0 bytes, remove the wrong judgment. Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com> Signed-off-by: Qi Tao <taoqi10@huawei.com>

uadk: fix for key address check

5e29122

The ctx key may be null if the user use the normal mode, it should return an error before copy data to the key. Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com> Signed-off-by: Qi Tao <taoqi10@huawei.com>

uadk: fix for v1 aead authsize check

9b42876

First, move the algorithm check to the right level, then we modified the alignment to 4 bytes from 16 bytes according to the hardware specification. Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com> Signed-off-by: Qi Tao <taoqi10@huawei.com>

uadk: fix for aead cbc mode authsize

29c8bd8

The alignment of authsize should be 4 bytes not 16 bytes according to the hardware specification. Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com> Signed-off-by: Qi Tao <taoqi10@huawei.com>

uadk_tool: print help when empty parameters

36693e6

Add print help when dfx/benchmark/test input empty parameters. Signed-off-by: Junchong Pan <panjunchong@h-partners.com> Signed-off-by: Qi Tao <taoqi10@huawei.com>

uadk: fix for aead soft compute

90af779

When soft computing is required, an invalid BD is used to ensure the integrity of the sending and receiving process, it is more efficient. Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com> Signed-off-by: Qi Tao <taoqi10@huawei.com>

Liulongfang and others added 28 commits December 11, 2025 10:38

Merge pull request Linaro#729 from tomismyfriend/master

244851d

uadk: Fix static analysis warning

Merge pull request Linaro#731 from tomismyfriend/master

8bbcb02

uadk_tool: update component functionality in uadk_tool

uadk: Add CI configuration

258ff7a

Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>

Merge pull request Linaro#732 from gaozhangfei/master-ci

f769d49

uadk: Add CI configuration

sanity_test: add set -x to print cmd

2e26dbd

When ci fails, it is very difficult to reproduce. Add set -x in build script, to make it is easier to find which cmd fails. Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>

Merge pull request Linaro#735 from tomismyfriend/master

8d25393

sanity_test: add set -x to print cmd

Merge pull request Linaro#737 from gaozhangfei/master_sanity_test

d96e2d5

uadk: sanity_test ignore zip if OpenSSL 3.0

uadk: release 2.10

eb6af98

Release 2.10 in 2025.12 Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org> Signed-off-by: Longfang Liu <liulongfang@huawei.com>

Merge pull request Linaro#739 from gaozhangfei/rel-2.10

d2e5932

uadk: release 2.10

uadk_tool: support get device usage

34c2e7c

Supports obtaining the device's bandwidth utilization. For usage details, refer to "uadk_tool dfx --help". Signed-off-by: Weili Qian <qianweili@huawei.com>

Merge pull request Linaro#742 from tomismyfriend/master

2b4a31a

uadk: support querying device bandwidth utilization

uadk: hash agg 8B type of the adaptation chip supports 8 columns

7415dea

Due to changes in chip specifications, the hash agg 8B and 16B operations have been reduced from 9 columns to 8 columns,requiring the driver to be adapted accordingly. Signed-off-by: Zhushuai Yin <yinzhushuai@huawei.com>

Merge pull request Linaro#749 from tomismyfriend/master

2330d26

uadk: hash agg 8B type of the adaptation chip supports 8 columns

uadk: set the fd for soft ctx

6a87bdb

Set the fd for soft ctx to avoid requesting reserved memory. Signed-off-by: Weili Qian <qianweili@huawei.com>

Merge pull request Linaro#752 from tomismyfriend/master

e5d4467

uadk: add the empty size for the hash table row size

uadk: fix the compilation failure of wd_alg.h

644e604

Fix the compilation failure of wd_alg.h, error log likes: wd_alg.h:121:9: error: unknown type name ‘__u8’. And improve code portability by including linux/types.h instead of asm/types.h. Signed-off-by: Weili Qian <qianweili@huawei.com>

Merge pull request Linaro#753 from tomismyfriend/master

35cdb6b

uadk: fix the compilation failure of wd_alg.h

Update README

cdeb858

Update the README document of the UADK project to make it more concise and understandable.

uadk: update the formate of the README

d4d14b5

to ensure the clarity and completeness of the README document, it is necessary to reformat it into markdown type and refine its content Signed-off-by: Liulongfang <liulongfang@huawei.com>

Liulongfang force-pushed the master branch from d78e979 to d4d14b5 Compare April 1, 2026 02:59

Liulongfang closed this Apr 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

uadk supports heterogeneous computing#658

uadk supports heterogeneous computing#658
Liulongfang wants to merge 201 commits intoLinaro:developfrom
Liulongfang:master

Liulongfang commented Jan 3, 2025 •

edited

Loading

Uh oh!

gaozhangfei commented Jan 3, 2025 •

edited

Loading

Uh oh!

Liulongfang commented Jan 3, 2025 •

edited

Loading

Uh oh!

gaozhangfei commented Jan 6, 2025

Uh oh!

Liulongfang commented Jan 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

Conversation

Liulongfang commented Jan 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gaozhangfei commented Jan 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Liulongfang commented Jan 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gaozhangfei commented Jan 6, 2025

Uh oh!

Liulongfang commented Jan 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

Liulongfang commented Jan 3, 2025 •

edited

Loading

gaozhangfei commented Jan 3, 2025 •

edited

Loading

Liulongfang commented Jan 3, 2025 •

edited

Loading