salsa20: performance optimizations (e.g. SIMD)

There are two big optimizations we could do on both the `chacha20` and `salsa20` crates.

## Avoid recomputing initial state

*EDIT: both crates now have a `new` method to compute the initial state, and separate `apply_keystream` / `generate` methods to compute a block*

- [x] `chacha20` crate
- [x] `salsa20` crate

[RFC 8439 Section 3](https://tools.ietf.org/html/rfc8439#section-3) describes caching the initial block state once computed as a performance optimization:

```
   Each block of ChaCha20 involves 16 move operations and one increment
   operation for loading the state, 80 each of XOR, addition and roll
   operations for the rounds, 16 more add operations and 16 XOR
   operations for protecting the plaintext.  Section 2.3 describes the
   ChaCha block function as "adding the original input words".  This
   implies that before starting the rounds on the ChaCha state, we copy
   it aside, only to add it in later.  This is correct, but we can save
   a few operations if we instead copy the state and do the work on the
   copy.  This way, for the next block you don't need to recreate the
   state, but only to increment the block counter.  This saves
   approximately 5.5% of the cycles.
```

## SIMD support

Both ChaCha20 and Salsa20 are amenable to SIMD optimizations. We should add SIMD optimizations on `x86`/`x86_64` at the very least.

### `x86`/`x86_64`
 
- `chacha20`
    - [x] SSE2 (#61)
    - [x] AVX2 (#83, #87)
- [ ] `salsa20`
    - [ ] SSE2
     - [ ] AVX2

### Other CPU architectures

- ARM?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

salsa20: performance optimizations (e.g. SIMD) #50

Avoid recomputing initial state

SIMD support

`x86`/`x86_64`

Other CPU architectures

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

salsa20: performance optimizations (e.g. SIMD) #50

Description

Avoid recomputing initial state

SIMD support

x86/x86_64

Other CPU architectures

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`x86`/`x86_64`