Add RISC-V Zicclsm support to MEM_FORCE_MEMORY_ACCESS #4524

Polaris-911 · 2025-11-03T08:23:43Z

Zicclsm: Main memory supports misaligned loads/stores
According to the RVA20U64 specification, the Zicclsm extension is mandatory and is supported in gcc versions 14.1 and above.
References
GCC Zicclsm
RVA20U64 specification

Performance Test Results: zstd with Different MEM_FORCE_MEMORY_ACCESS Settings

Test Environment

[root@r2044-r1-s1 ~]# dmidecode -t processor | grep "Version"
        Version: SG2044
[root@r2044-r1-s1 ~]# uname -a
Linux r2044-r1-s1 6.12.47-25.09.16.17.riscv64 #1 SMP Tue Sep 16 17:47:24 CST 2025 riscv64 riscv64 riscv64 GNU/Linux

Compressor	Metric	MEM_FORCE_MEMORY_ACCESS=2	MEM_FORCE_MEMORY_ACCESS=1	Improvement Ratio*
zstd 1.5.7 -1	Compression Speed	72.0 MB/s	57.5 MB/s	~25.2%
zstd 1.5.7 -1	Decompression Speed	93.7 MB/s	93.4 MB/s	~0.3%
zstd 1.5.7 -22	Compression Speed	0.24 MB/s	0.22 MB/s	~9.1%
zstd 1.5.7 -22	Decompression Speed	71.0 MB/s	66.8 MB/s	~6.3%

MEM_FORCE_MEMORY_ACCESS=1

[root@r2044-r1-s1 lzbench-master]#./lzbench -t0,0 -i5,5 -ezstd,1,22 ../silesia.tar
lzbench 2.1 | GCC 12.3.1 | 64-bit Linux |

Compressor name         Compress. Decompress. Compr. size  Ratio Filename
zstd 1.5.7 -1            57.5 MB/s  93.4 MB/s    73216302  34.54 ../silesia.tar
zstd 1.5.7 -22           0.22 MB/s  66.8 MB/s    52222248  24.64 ../silesia.tar
[Params] cIters=5 dIters=5 cTime=0.0 dTime=0.0 chunkSize=0KB cSpeed=0MB

MEM_FORCE_MEMORY_ACCESS=2

[root@r2044-r1-s1 lzbench-master]#./lzbench -t0,0 -i5,5 -ezstd,1,22 ../silesia.tar
lzbench 2.1 | GCC 12.3.1 | 64-bit Linux |

Compressor name         Compress. Decompress. Compr. size  Ratio Filename
zstd 1.5.7 -1            72.0 MB/s  93.7 MB/s    73216302  34.54 ../silesia.tar
zstd 1.5.7 -22           0.24 MB/s  71.0 MB/s    52222248  24.64 ../silesia.tar
[Params] cIters=5 dIters=5 cTime=0.0 dTime=0.0 chunkSize=0KB cSpeed=0MB

Polaris-911 · 2025-11-27T01:18:44Z

Hi @Cyan4973 , I know you're busy—just wanted to check if you could spare a moment to review this PR. Thanks in advance!

Cyan4973

We recommend retaining Method 2 solely as a "last resort" to force enable unaligned memory access on a local system.

However, we do not endorse its use "in general".

Method 2 essentially misleads the C virtual machine by asserting that memory addresses are aligned when, in reality, they are not. This constitutes undefined behavior (UB), and as such, we cannot guarantee reliable or predictable results.

Consequently, we are unable to approve this pull request in its current form.

The preferred and correct approach is Method 0, which is fully portable.

If the compiler recognizes that the target CPU supports unaligned memory access, it should optimize memcpy(d, s, 8) into a single read or write instruction. If this optimization does not occur, the issue lies with the compiler's optimization capabilities.

If that's the current situation regarding RISC-V, I would recommend pursuing improvements in compiler optimization to achieve a more robust and future-proof solution.

Polaris-911 · 2025-12-20T02:16:42Z

Hi @Cyan4973 , Thank you for the detailed explanation regarding Undefined Behavior. I completely understand the project's policy on avoiding UB to ensure portability and correctness.

However, I have benchmarked all three methods on the target hardware (RISC-V with zicclsm, GCC 12.3.1), and the performance gap is substantial.

Benchmark Data (Compression Speed)

Method	zstd -1 Speed	vs Method 0
Method 2	74.2 MB/s	+74%
Method 1	59.2 MB/s	+39%
Method 0	42.5 MB/s	-

My Suggestion

Method 0 is effectively unusable for high performance: As seen above, relying on memcpy results in a 42% performance drop compared to direct access (Method 2), and a 28% drop compared to the packed struct approach (Method 1). The current compiler unfortunately does not optimize memcpy into single instructions efficiently on this platform.

Request regarding Method 2: Method 2 yields the peak performance (74.2 MB/s). While I strictly acknowledge the UB concerns, unaligned access is natively supported on this hardware, and the current compiler limitations are the main bottleneck.

Could we consider allowing Method 2 as a temporary optimization strategy strictly under the __riscv_zicclsm guard?

We view this as a stopgap solution to bridge the massive performance gap while RISC-V compiler support matures. We are fully open to deprecating this path and reverting to the standard approach in the future, once GCC/Clang demonstrates the capability to optimize memcpy correctly for this target.

Data screenshot

Add RISC-V Zicclsm support to MEM_FORCE_MEMORY_ACCESS

a2e0393

meta-cla bot added the CLA Signed label Nov 3, 2025

Cyan4973 self-assigned this Dec 2, 2025

Cyan4973 requested changes Dec 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add RISC-V Zicclsm support to MEM_FORCE_MEMORY_ACCESS #4524

Add RISC-V Zicclsm support to MEM_FORCE_MEMORY_ACCESS #4524

Polaris-911 commented Nov 3, 2025

Uh oh!

Polaris-911 commented Nov 27, 2025

Uh oh!

Cyan4973 left a comment •

edited

Loading

Uh oh!

Polaris-911 commented Dec 20, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add RISC-V Zicclsm support to MEM_FORCE_MEMORY_ACCESS #4524

Are you sure you want to change the base?

Add RISC-V Zicclsm support to MEM_FORCE_MEMORY_ACCESS #4524

Conversation

Polaris-911 commented Nov 3, 2025

Performance Test Results: zstd with Different MEM_FORCE_MEMORY_ACCESS Settings

Uh oh!

Polaris-911 commented Nov 27, 2025

Uh oh!

Cyan4973 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Polaris-911 commented Dec 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark Data (Compression Speed)

My Suggestion

Data screenshot

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Cyan4973 left a comment •

edited

Loading

Polaris-911 commented Dec 20, 2025 •

edited

Loading