Skip to content

feat: Add Snappy support#466

Open
Erouan50 wants to merge 13 commits intoNullus157:mainfrom
Erouan50:feat/snappy
Open

feat: Add Snappy support#466
Erouan50 wants to merge 13 commits intoNullus157:mainfrom
Erouan50:feat/snappy

Conversation

@Erouan50
Copy link
Copy Markdown

Hello there!

This pull request introduces support for framed snappy decoder and encoder. I'm not an expert in compression/decompression, but I feel the snappy one is quite different from the others. My work was inspired by the gzip decoding (for the state machine), the work done in the snap crate in the framing reader and writer, and of course the actual spec. One big difference is the whole frame must be buffered before compressing or decompressing it.

I have a few questions/notes:

  • The "buffering" can be skipped if the input and/or output buffers are the right size. Should I implement that in this pull request?
  • One test is failing (snappy::tokio::bufread::decompress::trailer), but I don't know how to fix it. If I understand the test correctly, it checks if the decoder can stop decoding with some trailing data in the buffer. But with the framing format, I don't know how to address that. Should I stop decoding when the chunk type is invalid, or should I stop and fail (current behavior)?
  • Should I add more tests? Because the test data ([1, 2, 3...]) are not compressible in snappy, the encoder and decoder never compress or decompress anything. But is it the job of the proptests to create some random test data?

Erouan50 and others added 4 commits March 27, 2026 09:53
Implement the Snappy framing format as a new compression algorithm,
following the same pattern as the existing codecs (zstd, lz4, etc.).

The encoder produces Snappy frames with stream identifiers,
compressed/uncompressed chunks (falling back to uncompressed when
compression ratio is poor), and masked CRC32C checksums. The decoder
is a state machine that parses frame headers, buffers chunk data, and
verifies checksums before emitting decompressed output.

New dependencies: `snap` for raw Snappy compression and `crc32c` for
checksum computation. Both are optional behind the `snappy` feature
flag.
@NobodyXu
Copy link
Copy Markdown
Collaborator

The "buffering" can be skipped if the input and/or output buffers are the right size. Should I implement that in this pull request?

I'm ok with either, though based on the size of the PR and that its CI is failing, I think getting a baseline implementation merged is a more sensible decision

@Erouan50
Copy link
Copy Markdown
Author

The "buffering" can be skipped if the input and/or output buffers are the right size. Should I implement that in this pull request?

I'm ok with either, though based on the size of the PR and that its CI is failing, I think getting a baseline implementation merged is a more sensible decision

That's fair. Let me fix the proptest. I forgot them...

@Erouan50
Copy link
Copy Markdown
Author

Erouan50 commented Mar 27, 2026

The "buffering" can be skipped if the input and/or output buffers are the right size. Should I implement that in this pull request?

I'm ok with either, though based on the size of the PR and that its CI is failing, I think getting a baseline implementation merged is a more sensible decision

@NobodyXu I do agree to implement that in other PR. I fixed the tests except the one I mentioned in the description:

     Summary [  41.303s] 884 tests run: 882 passed, 2 failed, 0 skipped
        FAIL [   0.013s] async-compression::snappy snappy::futures::bufread::decompress::trailer
        FAIL [   0.008s] async-compression::snappy snappy::tokio::bufread::decompress::trailer

Do you have any opinion on how should I fix it? Or should I just skip this test for snappy?

Copy link
Copy Markdown
Collaborator

@NobodyXu NobodyXu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I'm really unhappy with all the repeated vec creation which resulted in repeated copying and heap allocation

Comment on lines +30 to +35
0xFF => Self::Stream,
0x00 => Self::Compressed,
0x01 => Self::Uncompressed,
0xFE => Self::Padding,
0x02..=0x7f => Self::ReservedUnskippable(value),
0x80..=0xFD => Self::ReservedSkippable(value),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to hardcode it here?

Ideally the ChunkType should be in snap

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is but the enum is not public.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be better if snap makes it public under the raw module

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I'll open a pull request to the repository, but I can't guarantee they'll accept the changes.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's ok! We should at least try it and I believe the motivation for exposing this is reasonable

}

fn mask_crc(crc: u32) -> u32 {
(crc.wrapping_shr(15) | crc.wrapping_shl(17)).wrapping_add(0xA282EAD8)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another magic value, better in snap, I assume this is probably duplicate code?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it's duplicate code. The function is not public.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have it public under snap::raw module?

Hardcoding it here really feels wrong

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll open a pull request.

Copy link
Copy Markdown
Collaborator

@NobodyXu NobodyXu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question regarding why encoder is designed with extra buffering

I can understand that the decoder needs it as we cannot control on bow the chunks are created

@NobodyXu
Copy link
Copy Markdown
Collaborator

Do you have any opinion on how should I fix it? Or should I just skip this test for snappy?

Does snappy have a "file header" that recorded the size of the entire stream?

If we have it, then we can configure the decoder to stop decoding once that limit is matched, and only way to continue would be reinit

Otherwise we cannot fix it

@Erouan50
Copy link
Copy Markdown
Author

Do you have any opinion on how should I fix it? Or should I just skip this test for snappy?

Does snappy have a "file header" that recorded the size of the entire stream?

If we have it, then we can configure the decoder to stop decoding once that limit is matched, and only way to continue would be reinit

Otherwise we cannot fix it

No, the first "stream" chunk is just there to mark the start of the file, but it doesn't include the length of the whole stream.

@Erouan50
Copy link
Copy Markdown
Author

Thank you, I'm really unhappy with all the repeated vec creation which resulted in repeated copying and heap allocation

You're right, I apologize for that... I tried to apply your suggestions. It's not perfect (I don't like the reset() + clear() + resize() in the decoder for example), but I hope I'm going to the right direction?

@NobodyXu

This comment has been minimized.

Copy link
Copy Markdown
Collaborator

@NobodyXu NobodyXu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

My feedback is that the copying around of uncompressed sections is unnecessary and should be directly copied into the output

And that snap::raw should have more public items to prevent duplicate, and that it needs to support uninitialised buffer for maximum efficiency

Comment on lines +43 to +45
out_buf.resize(uncompress_length, 0);
let mut decoder = snap::raw::Decoder::new();
decoder.decompress(data, out_buf)?;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a shame that snap does not support writing to uninitialised buffer, zeroing the whole vec would indeed take quite some time

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to support it. It could be a good challenge to tackle :).

Comment on lines +70 to +72
Skipping(usize),
Buffering {
remaining: usize,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Skipping(usize),
Buffering {
remaining: usize,
Skipping(NonZeroUize),
Buffering {
remaining: NonZeroUsize,

Using NonZeroUsize would make it clear the invariant of the state

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I can do that because remaining and skipping can be 0.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the neat part: if it's 0, then we just skip that stage and go to the next stage

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed! That's better that way.

out_buf.resize(max_compress_size + 8, 0);

let mut encoder = snap::raw::Encoder::new();
let compress_data = encoder.compress(in_buffer, &mut out_buf[8..])?;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a shame snap::raw doesn't support uninitialised buffer

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to support it as well.

@NobodyXu
Copy link
Copy Markdown
Collaborator

It's not perfect (I don't like the reset() + clear() + resize() in the decoder for example), but I hope I'm going to the right direction?

Yeah it's the right direction, it looks ugly because snap::raw does not support uninitialised buffer, speaking of which the {de, en}code_vec is also very inefficient due to that

@Erouan50
Copy link
Copy Markdown
Author

Erouan50 commented Mar 30, 2026

It's not perfect (I don't like the reset() + clear() + resize() in the decoder for example), but I hope I'm going to the right direction?

Yeah it's the right direction, it looks ugly because snap::raw does not support uninitialised buffer, speaking of which the {de, en}code_vec is also very inefficient due to that

I agree. And even the whole decoding/encoding of the frames should be in the snap crate. I did it here because the crate supports only io::Read and io::Write.

If you want, I can give a shot at contributing to the crate with all the changes you mentioned and we pause/close that pull request?

@NobodyXu
Copy link
Copy Markdown
Collaborator

NobodyXu commented Mar 30, 2026

If you want, I can give a shot at contributing to the crate with all the changes you mentioned and we pause/close that pull request?

Yeah it'd be awesome great to get them upstreamed, thank you!

If you can share the PR/tracking issue so that I can be in the loop that'd even better

Copy link
Copy Markdown
Collaborator

@NobodyXu NobodyXu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks just some minor nit

Copy link
Copy Markdown
Collaborator

@NobodyXu NobodyXu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks good to me now, except for the NonZeroUsize and the upstream changes required (more pub and uninitialised buffer

cc @robjtede

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants