Skip to content

Add RISC-V RVV SIMD optimization for rte_hash_k16_cmp_eq() in hash libraryRvv hash#115

Open
P1erreCashon wants to merge 29 commits intoDPDK:mainfrom
P1erreCashon:rvv_hash
Open

Add RISC-V RVV SIMD optimization for rte_hash_k16_cmp_eq() in hash libraryRvv hash#115
P1erreCashon wants to merge 29 commits intoDPDK:mainfrom
P1erreCashon:rvv_hash

Conversation

@P1erreCashon
Copy link

Adds RISC-V RVV (RISC-V Vector Extension) SIMD optimization for DPDK hash key comparison, primarily accelerating 16-byte key equality checks used in hash lookup fast paths:

Implements RVV-optimized 16-byte key compare routine for rte_hash_k16_cmp_eq()

Uses RVV vector load + compare + mask reduction to accelerate equality detection

Adds RISC-V RVV implementation file under hash library SIMD path

Updates build system to enable RVV code path when __riscv_vector is available

This optimization improves hash lookup performance on RISC-V platforms supporting RVV, providing a scalable SIMD alternative to scalar comparisons.

tomzawadzki and others added 29 commits May 16, 2024 08:50
SPDK provides isa-l submodule with -I and -L.

Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/dpdk/+/21092 (spdk-23.11)

(cherry picked from commit 68108f1886be82bd8c2daea0e05237292c8fa222)
Change-Id: I99924fc161a876ef017b9cdeeee52e2aed30d8ec
Signed-off-by: Sebastian Brzezinka <sebastian.brzezinka@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/dpdk/+/22689
Reviewed-by: Jim Harris <jim.harris@samsung.com>
Tested-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
mlx5 common library checks if several symbols/definitions
are presented in system header files. If some are not
presented, they will be enabled by mlx5_glue library.
The problem appears with clang and '-Werror' - code
generated by meson is not compiled due to unused variable:

Code:

        #include <infiniband/mlx5dv.h>
        int main(void) {
            /* If it's not defined as a macro, try to use as a symbol */
            #ifndef mlx5dv_create_flow_action_packet_reformat
                mlx5dv_create_flow_action_packet_reformat;
            #endif
            return 0;
        }
Compiler stdout:

Compiler stderr:
 /hpc/local/work/alexeymar/repo/spdk/dpdk/build-tmp/meson-private/tmp5obnak86/testfile.c:6:17: error: expression result unused [-Werror,-Wunused-value]
                mlx5dv_create_flow_action_packet_reformat;
                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

As result, almost all symbols are enabled in mlx5_glue while
they exist is system headers. As result, we get multiple
symbols redefenitions when we compile mlx5_common.
As a solution for this problem we can suppress
-Wunused-vaurable using pragma

DPDK 23.11 note:
Starting with commit bellow, all cflags are passed to the has_header_symbol().
(33d6694) build: use C11 standard
To make sure that the symbol is properly detected, the pedantic flags needs to
be removed.

Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/dpdk/+/21093 (spdk-23.11)

(cherry picked from commit 02e24e008bf2241f9f01360c2e0580700219c374)
Change-Id: I03ba5d03f7e53d8e593a9de1deace5140c67d21d
Signed-off-by: Sebastian Brzezinka <sebastian.brzezinka@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/dpdk/+/22688
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Tested-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Jim Harris <jim.harris@samsung.com>
Allocation would fail with ASan enabled if the size and alignment was
equal to half of the page size, e.g.:

size_t pg_sz = 2 * (1 << 20);
rte_malloc(NULL, pg_sz / 2, pg_sz / 2);

In such case, try_expand_heap_primary() only allocated one page but it
is not enough to fit this allocation with such alignment and
MALLOC_ELEM_TRAILER_LEN > 0, as correctly checked by
malloc_elem_can_hold().

Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/dpdk/+/21096 (spdk-23.11)

(cherry picked from commit 5dd0c0388764244e7d1cafd29784be41bafd97d2)
Change-Id: I50e51ed25ad9760260e50599405a0ed766a274c7
Signed-off-by: Sebastian Brzezinka <sebastian.brzezinka@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/dpdk/+/23150
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Jim Harris <jim.harris@samsung.com>
Tested-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
In SPDK Jenkins CI the QAT devices only support 16VFs.
Per DPDK QAT documentation this could exceed the value of
RTE_CRYPTO_MAX_DEVS.

Ideally this should be configured by SPDK when building submodule,
but for now workaround #2258.

Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/dpdk/+/21071 (spdk-23.11)

(cherry picked from commit a211408a784ec318d1b2e61962343eddce1a7a35)
Change-Id: Ic9e22155564b2d1e6f685bac0f958836da4fc13b
Signed-off-by: Sebastian Brzezinka <sebastian.brzezinka@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/dpdk/+/23184
Reviewed-by: Jim Harris <jim.harris@samsung.com>
Tested-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Multi-process applications operate on shared hugepage memory but each
process has its own ASan shadow region which is not synchronized with
the other processes. This causes issues when different processes try to
use the same memory because they have their own view of which addresses
are valid.

Fix it by mapping the shadow regions for memseg lists as shared memory.
The primary process is responsible for creating and removing the shared
memory objects.

Disable ASan instrumentation for triggering the page fault in
alloc_seg() because if the segment is already allocated by another
process and is marked as free in the shadow, accessing this address will
cause an ASan error.

Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/dpdk/+/21097 (spdk-23.11)

(cherry picked from commit f2cd1fb8eec58d190af52dac47ff9013ce084a9f)
Change-Id: I2bb6ae100d080aad30ee44d7e5d200962f74d1a8
Signed-off-by: Sebastian Brzezinka <sebastian.brzezinka@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/dpdk/+/23151
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Jim Harris <jim.harris@samsung.com>
Very few libraries in DPDK are marked as optional.
For SPDK when most of the drivers are disabled,
the requirements are much lower.

By removing the check for optional libraries,
it is possible to pass a narrow set of actually required
libraries.

Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/dpdk/+/21095 (spdk-23.11)

(cherry picked from commit 792dc9824d8d15c9f9fcfbe87c0b05242306a26b)
Change-Id: I60fbc7307a4f33482025a3b3c00948c091d236ff
Signed-off-by: Sebastian Brzezinka <sebastian.brzezinka@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/dpdk/+/22687
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Jim Harris <jim.harris@samsung.com>
Modified the Configuration file to use the latest ARM Cross-Compiler.

Fixed the linker errors for the undefined references to the APIs
isal_deflate_init, isal_deflate, isal_inflate_init, isal_inflate,
isal_inflate_stateless, isal_deflate_stateless,
isal_deflate_set_hufftables in the case of ARM Cross-Compilation.

Signed-off-by: Krishna Kanth Reddy <krish.reddy@samsung.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/dpdk/+/21094 (spdk-23.11)

(cherry picked from commit 26bb8ea9748890596904a90c6d1df9ff501975e9)
Change-Id: I0ba89e5640760276646d6b9211585ad116ebf446
Signed-off-by: Sebastian Brzezinka <sebastian.brzezinka@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/dpdk/+/22686
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <jim.harris@samsung.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Starting with Clang 17 the list of pmds could
contain empty string. Please see:
https://bugs.dpdk.org/show_bug.cgi?id=1313

This is a fix proposed by alialnu@nvidia.com in the
issue above.

Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: Ic797fb39b6676d27aab0acdfdf79056ec03bbb35
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/dpdk/+/21135/ (spdk-23.11)
Signed-off-by: Sebastian Brzezinka <sebastian.brzezinka@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/dpdk/+/23194
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Due to a change in ASan behavior[1] the mapped shadow shared memory
regions are remapped later, when segments are mapped. So instead of
mapping the whole shadow region when reserving the memseg list memory,
map only the fragments corresponding to the segments after they are
mapped.

[1] llvm/llvm-project@a34e702

Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Change-Id: Ia9881639ddeb158da6e6590f3fef95e314e2a33d
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/dpdk/+/23659
Reviewed-by: Jim Harris <jim.harris@samsung.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
msl->shm_fd was in shared memory and a secondary process could change
it, causing the primary process to map wrong files into the shadow
region. Fix it by keeping the file descriptors in a private array in
each process.

Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Change-Id: Iae2a13b3f054bdf52b1ff1c3e24ea155972f8caf
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/dpdk/+/24763
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
Reviewed-by: Jim Harris <jim.harris@samsung.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Issue:
Two threads:

- A, executing rte_eal_alarm_cancel,
- B, executing eal_alarm_callback.

Such case can cause starvation of thread B. Please see that there is a
small time window between lock and unlock in thread A, so thread B must
be switched to within a very small time window, so that it can obtain
the lock.

Solution to this problem is use sched_yield(), which puts current thread
(A) at the end of thread execution priority queue and allows thread B to
execute.

The issue can be observed e.g. on hot-pluggable device detach path.
On such path, rte_alarm can used to check if DPDK has completed
the detachment. Waiting for completion, rte_eal_alarm_cancel
is called, while another thread periodically calls eal_alarm_callback
causing the issue to occur.

Change-Id: I00256e0d29fd507443fcc1784bfa916f1af7d213
Signed-off-by: Wojciech Panfil <wojciech.panfil@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/dpdk/+/24275
Reviewed-by: Jacek Kalwas <jacek.kalwas@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <jim.harris@samsung.com>
Reviewed-by: Konrad Sztyber <konrad.sztyber@intel.com>
Set the physical address for digest buffer.

Fixes: a785af1 ("pdcp: add pre and post process for UL")
Cc: stable@dpdk.org

Signed-off-by: Radu Nicolau <radu.nicolau@intel.com>
Acked-by: Kai Ji <kai.ji@intel.com>
Acked-by: Anoob Joseph <anoobj@marvell.com>
Align the vector address rather than computed source address to
make sure the alignment is properly propagated.

Fixes: 2531743 ("crypto/qat: fix source buffer alignment")
Cc: stable@dpdk.org

Signed-off-by: Radu Nicolau <radu.nicolau@intel.com>
Acked-by: Kai Ji <kai.ji@intel.com>
Added note for QAT driver information and device
configuration for services.

Signed-off-by: Emma Finn <emma.finn@intel.com>
Acked-by: Kai Ji <kai.ji@intel.com>
Added conditional definition for cache line size:
 - For CN10K and CN9k platform, set cache line size to 128 bytes.
 - For others, default to 256 bytes.

Signed-off-by: Nithinsen Kaithakadan <nkaithakadan@marvell.com>
Aligning CPTR to 256B for TLS cases.

Signed-off-by: Tejasree Kondoj <ktejasree@marvell.com>
Fix mbuf sanity check failures by updating nb_segs
field after mbuf allocation. Without this update, the
append function fails due to incorrect segment count.

Fixes: dcdd016 ("test/crypto: add GMAC SGL")
Fixes: 4322009 ("test/crypto: add PDCP cases for scatter gather")
Fixes: f3dbf94 ("app/test: check SGL on QAT")
Cc: stable@dpdk.org

Signed-off-by: Nithinsen Kaithakadan <nkaithakadan@marvell.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
This patch fixes RSA sign data length assignment to correct value.
The length was previously altered during a test scenario and is
now restored to the proper value.

Fixes: 9682e82 ("test/crypto: add negative case for RSA verification")
Cc: stable@dpdk.org

Signed-off-by: Nithinsen Kaithakadan <nkaithakadan@marvell.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
For RSA verify operations with RTE_CRYPTO_RSA_PADDING_NONE, the driver
cannot determine which padding algorithm the application is using.
As per the API specification in rte_crypto_asym.h, when
RTE_CRYPTO_RSA_PADDING_NONE and RTE_CRYPTO_ASYM_OP_VERIFY are selected,
the decrypted signature should be returned to the application in the
cipher output buffer.

Fixes: dfd038b ("crypto/cnxk: refactor RSA verification")
Cc: stable@dpdk.org

Signed-off-by: Garvit Varshney <gvarshney@marvell.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
Fixes bugs with casting and checksum calculation for
UDC checksum

Fixes: 0dc314d ("compress/zlib: support dictionaries and PDCP checksum")
Cc: stable@dpdk.org

Signed-off-by: Sameer Vaze <svaze@qti.qualcomm.com>
Acked-by: Ashish Gupta <ashishg@marvell.com>
dpaa2_sec_dev_init() sets the crypto device name again after
it has been set by rte_cryptodev_pmd_create/allocate().
Overwriting its value could end up as a bug if the cryptodev
library changes the way it calls cryptodev objects.

Besides, there is no need to generate a name for the crypto device
different than the bus device, as there is a 1:1 relation between
those objects.

Reuse the bus device name directly, iow: dpseci.XXX instead of dpsec-XXX.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Hemant Agrawal <hemant.agrawal@nxp.com>
Extend openssl crypto PMD to support AES XTS operations.

Signed-off-by: Shaokai Zhang <felix.zhang@jaguarmicro.com>
Reviewed-by: Joey Xing <joey.xing@jaguarmicro.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
OpenSSL 3.X has support for SHAKE, Hence adding
SHAKE-128 and SHAKE-256 support to the OpenSSL PMD.

Signed-off-by: Emma Finn <emma.finn@intel.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
For out-of-place (OOP) inline ingress test, some of the
hardware supports ESP specific flow rule instead of
default flow rule.
OOP test case will try first using ESP specific flow rule
with SPI specified in flow pattern. If ESP rule is not
supported then will retry with default flow rule.

Signed-off-by: Rahul Bhansali <rbhansali@marvell.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
This patch adds support for Chinese cryptographic algorithms in the
IPsec security gateway example application:

- Add SM4-CBC cipher algorithm support with 16-byte IV and key;
- Add SM3-HMAC authentication algorithm support with 20-byte key;
- Update SA configuration parsing to recognize "sm4-cbc" and "sm3-hmac"
keywords;
- Implement proper IV handling and authentication offset/length
configuration.

These additions enable the IPsec security gateway to use Chinese
national cryptographic standards for secure communications.

Signed-off-by: Sunyang Wu <sunyang.wu@jaguarmicro.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>
The documentation had combined to unrelated drivers together.
Use AI to split into two separate files:
pcap.rst for the pcap PMD and ring.rst for the ring PMD.

Changes to pcap.rst:
- Use "pcap" consistently instead of mixed "libpcap/pcap/PCAP" naming
- Remove Linux-specific references; document support for Linux, FreeBSD,
  and Windows
- Add reference to upstream libpcap documentation
- Add multi-queue support section explaining queue count determination
  and file handle limitations
- Use ``--vdev=net_pcap0`` format consistently
- Remove deprecated rte_eth_from_pcaps() API section
- Improve technical documentation style throughout

Changes to ring.rst:
- Use ``--vdev=net_ring0`` format consistently
- Fix inconsistent "Rings-based/Ring-based" naming
- Retain rte_eth_from_rings() API section with usage examples
- Improve technical documentation style throughout

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
It was unclear if mbuf fast release could support segmented packets, or if
mbuf fast release required non-segmented packets.
This has now been investigated in detail, and it was concluded that
segmented packets can be supported with mbuf fast release still achieving
the enhanced performance.
So the description of the mbuf fast release Tx offload flag was fixed.

Furthermore, the general descriptions of the Rx and Tx offloads were
improved, to reflect that they are not only for device capability
reporting, but also for device and queue configuration purposes.

NB: If a driver does not support segmented packets with mbuf fast release,
it can check the multi segment send flag when selecting transmit function.

Fixes: 5562417 ("mbuf: add raw free and alloc bulk functions")
Cc: stable@dpdk.org

Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.