Skip to content

rng: Add AVX2 + NEON asm.#229

Open
klauspost wants to merge 2 commits intominio:mainfrom
klauspost:add-avx2-neon
Open

rng: Add AVX2 + NEON asm.#229
klauspost wants to merge 2 commits intominio:mainfrom
klauspost:add-avx2-neon

Conversation

@klauspost
Copy link
Copy Markdown
Contributor

Merge after #228 to include arm64 checks.

Before/after...

pkg: github.com/minio/pkg/v3/rng
cpu: AMD Ryzen 9 9950X 16-Core Processor
BenchmarkReader
BenchmarkReader/1000-32         	46546988	        25.88 ns/op	38635.30 MB/s	       0 B/op	       0 allocs/op
BenchmarkReader/1024-32         	70920727	        17.14 ns/op	59755.33 MB/s	       0 B/op	       0 allocs/op
BenchmarkReader/16384-32        	 5805674	       204.9 ns/op	79950.02 MB/s	       0 B/op	       0 allocs/op
BenchmarkReader/1048576-32      	   92539	     14080 ns/op	74470.24 MB/s	       0 B/op	       0 allocs/op

BenchmarkReader/1000-32         	52974752	        22.57 ns/op	44300.70 MB/s	       0 B/op	       0 allocs/op
BenchmarkReader/1024-32         	100000000	        11.37 ns/op	90096.95 MB/s	       0 B/op	       0 allocs/op
BenchmarkReader/16384-32        	14598060	        81.69 ns/op	200552.58 MB/s	       0 B/op	       0 allocs/op
BenchmarkReader/1048576-32      	  174301	      6384 ns/op	164256.53 MB/s	       0 B/op	       0 allocs/op

Merge after minio#228 to include arm64 checks.

Before/after...

```
pkg: github.com/minio/pkg/v3/rng
cpu: AMD Ryzen 9 9950X 16-Core Processor
BenchmarkReader
BenchmarkReader/1000-32         	46546988	        25.88 ns/op	38635.30 MB/s	       0 B/op	       0 allocs/op
BenchmarkReader/1024-32         	70920727	        17.14 ns/op	59755.33 MB/s	       0 B/op	       0 allocs/op
BenchmarkReader/16384-32        	 5805674	       204.9 ns/op	79950.02 MB/s	       0 B/op	       0 allocs/op
BenchmarkReader/1048576-32      	   92539	     14080 ns/op	74470.24 MB/s	       0 B/op	       0 allocs/op

BenchmarkReader/1000-32         	52974752	        22.57 ns/op	44300.70 MB/s	       0 B/op	       0 allocs/op
BenchmarkReader/1024-32         	100000000	        11.37 ns/op	90096.95 MB/s	       0 B/op	       0 allocs/op
BenchmarkReader/16384-32        	14598060	        81.69 ns/op	200552.58 MB/s	       0 B/op	       0 allocs/op
BenchmarkReader/1048576-32      	  174301	      6384 ns/op	164256.53 MB/s	       0 B/op	       0 allocs/op
```
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds architecture-specific assembly implementations for the rng package’s XOR routine to improve throughput on amd64 (AVX2) and arm64 (NEON), while expanding tests to validate correctness across sizes and key patterns.

Changes:

  • Add arm64 NEON assembly implementation and wiring (xorSliceNEON).
  • Split amd64 assembly into SSE2 + AVX2 implementations and dispatch at runtime via CPU feature detection.
  • Add additional XOR correctness tests (zero key, double-apply property, all aligned sizes, known vectors).

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
rng/xor_noasm.go Update build constraints so the Go fallback is used on non-(amd64/arm64) and when noasm/appengine/gccgo is set.
rng/xor_arm64.s Add NEON assembly implementation for arm64 XOR.
rng/xor_arm64.go Add arm64 Go wrapper that calls the NEON routine (guarded by build tags).
rng/xor_amd64.s Rename existing amd64 routine to SSE2 and add AVX2 implementation.
rng/xor_amd64.go Add runtime dispatch between AVX2 and SSE2 using cpuid.
rng/reader_test.go Add broader correctness tests comparing asm vs Go and validating properties/known vectors.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@klauspost klauspost requested a review from rraulinio April 7, 2026 12:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants