crc32: Add implementation for loongarch64#511
Conversation
|
Testing this requires rust-lang/libz-sys#269 |
| // CRC32 instructions are part of the basic integer operations and therefore | ||
| // always available. | ||
| mod asm { | ||
| /// CRC32 single round checksum for bytes (8 bits). | ||
| /// | ||
| /// [Loongson's documentation](https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html#crc-check-instructions) | ||
| pub fn crc_w_b_w(data: i8, mut crc: i32) -> i32 { | ||
| unsafe { | ||
| core::arch::asm!("crc.w.b.w {crc}, {data}, {crc}", crc = inout(reg) crc, data = in(reg) data); | ||
| crc | ||
| } | ||
| } |
There was a problem hiding this comment.
So here is an idea: rust-lang/miri#4899 added crc32 support for aarch, you could add similar support for loongarch? A requirement for new shims is often that you actually check the results on real hardware, but that is something you can actually do.
With miri support we can actually test this in CI today if instead of asm you use the intrinsics here. We can add some __internal feature for it if needed.
Separately we could actually stabilize the crc intrinsics on loongarch if the target maintainer is OK with that. For signatures we follow the clang headers by default (but it's ultimately up to the target maintainer), hence the i32 instead of more accurate widths.
There was a problem hiding this comment.
Sure, adding support for these in Miri sounds good. I'll try to put together a PR this evening.
With miri support we can actually test this in CI today if instead of asm you use the intrinsics here. We can add some
__internalfeature for it if needed.
So you would be okay with requiring nightly behind a feature flag?
Separately we could actually stabilize the crc intrinsics on loongarch if the target maintainer is OK with that. For signatures we follow the clang headers by default (but it's ultimately up to the target maintainer), hence the i32 instead of more accurate widths.
Funny enough, the clang headers do have different (in my view, better) signatures (see https://github.com/llvm/llvm-project/blob/main/clang/lib/Headers/larchintrin.h#L60) and use i8 and i16 where appropriate. GCC also seems to do this better (https://github.com/gcc-mirror/gcc/blob/master/gcc/config/loongarch/larchintrin.h#L152). I'll submit a patch to align the LLVM intrinsics with clang and GCC. Afterwards I think stabilization would be reasonable, though I've also noticed that LLVM completely prohibits using the CRC intrinsics on 32-bit, while Rust currently exposes them on both 32-bit and 64-bit, which should cause compile errors. I'm not sure whether that's a hardware limitation or not and what is actually correct in that case.
There was a problem hiding this comment.
So you would be okay with requiring nightly behind a feature flag?
We already do for gzprintf, which will just become a default-enabled feature when our msrv reaches whatever version c-variadic will be stable in. We're conservative with bumping the msrv but c-variadic and custom allocators are enticing reasons to bump.
For now, given that I don't think there is a really urgent use case I'd say add it as an __internal_loongarch_crc32 feature, we can give it a proper name once we have a bit more clarity on the testing and stabilization situation.
Funny enough, the clang headers do have different (in my view, better) signatures
cc @heiher on the crc questions
There was a problem hiding this comment.
Hmmm actually, now that I'm looking at it (after I'm hitting operand promotion LLVM crashes when changing the LLVM intrinsics), the aarch64 intrinsics are also defined with i32 in LLVM. The stdarch wrappers around them simply perform the cast, similar to how clang does it. I guess this doesn't need changes in LLVM, only in stdarch.
I run https://github.com/trifectatechfoundation/zlib-rs/blob/main/test-libz-rs-sys/examples/crc32_bench.rs with https://github.com/andrewrk/poop for the extra counters (cycles, instructions)
At the current level of complexity I think the duplication (perhaps with a comment pointing this out) is fine. |
|
Taking a random 10GB file of bytes from |
|
FYI you can give poop 2 (or more) argument strings and it will do the math and statistical significance and all that. Not really needed here though, it's pretty obvious. Nice! |
Promised @folkertdev I'd start working on this at RustWeek (hi!), so here's the very low hanging fruit: LoongArch has CRC32 instructions that are extremely similar to those on aarch64, but they're part of the base integer instruction set and therefore unconditionally available: https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html#crc-check-instructions.
stdarch has intrinsics for these, see e.g. here, but they're nightly-only, so inline assembly is necessary to make this work on stable.
I think the function signatures in stdarch are terrible, they do match LLVM's, but a parameter where only the lower 8 bits are used should really just be a u8/i8 instead of an i32. I went with that approach for the re-implementation here and will see if I can get this changed in LLVM and stdarch. The choice of signed integers makes this uglier than the aarch64 implementation, but it matches stdarch, so I went with that. This also won't work in miri yet.
I'd love to add CI for loongarch64 here, but the cross compilers are only available in Ubuntu starting with 25.10 and 26.04, so we'll have to wait until GitHub adds a runner image based on that.
cargo nextest run --features gzpasses for me on a Loongson 3B6000. Are there existing benchmarks for these implementations or do you always just copy the relevant parts out and run benchmarks separately when adding new implementations?zlib-ng has managed to somewhat abstract over the aarch64 and loongarch64 intrinsics to share the implementation. Ideally we'd do something similar here, I guess we could wrap the intrinsics to have the same signatures?