Skip to content

solidity: byte-swap + single mstore for uint64/uint128 (de)serialization#95

Draft
deuszx wants to merge 1 commit into
zefchain:mainfrom
deuszx:solidity-pr-4-uint64-128-mstore
Draft

solidity: byte-swap + single mstore for uint64/uint128 (de)serialization#95
deuszx wants to merge 1 commit into
zefchain:mainfrom
deuszx:solidity-pr-4-uint64-128-mstore

Conversation

@deuszx
Copy link
Copy Markdown
Contributor

@deuszx deuszx commented May 14, 2026

Summary

Replace the byte-by-byte little-endian construction loop in
bcs_serialize_uint64 / bcs_serialize_uint128 and the mirror
construction in bcs_deserialize_offset_uint64 / bcs_deserialize_offset_uint128
with a constant-time byte-swap chain followed by a single mstore /
mload.

The swap chain reverses byte order (BCS is little-endian, EVM mstore
is big-endian); the assembly block then deposits the swapped value at
the top of a 32-byte word so mstore writes the BCS bytes at
result[0..N] in one operation. The deserialize path mirrors this
with mload + shr + the same swap chain.

Benchmarks

Measured with forge test --gas-report, via_ir = true,
optimizer_runs = 200, solc 0.8.33.

Function Old gas New gas Savings
ser64 1838 712 −1126 (61%)
deser64 2138 875 −1263 (59%)
ser128 3214 888 −2326 (72%)
deser128 3889 1089 −2800 (72%)

Yul does not collapse the old byte-by-byte loop to anything close
to the unrolled form, so these savings are real on hot paths
(e.g. light-client certificate verification, which decodes many
u64 fields per call).

Deployed bytecode (harness contract that inlines the library, both
forms):

Form Bytes Deployment gas
Old 1080 280 973
New 1390 347 859
Δ +310 +66 886

Break-even: ~45 mixed uint64 / uint128 (de)serialize calls. A
single certificate verification hits orders of magnitude more.

Reproduce the benchmark

  1. Save the two libraries below as Old.sol and New.sol. Each file
    exports a small harness contract whose ser* / deser* methods
    delegate to the library so foundry can measure them.

    Old.sol is the form before this PR (the byte-by-byte loop).
    New.sol is the form after this PR.

  2. Create foundry.toml:

    [profile.default]
    src = "."
    out = "out"
    test = "test"
    via_ir = true
    optimizer = true
    optimizer_runs = 200
  3. Save the test harness as test/Bench.t.sol:

    // SPDX-License-Identifier: UNLICENSED
    pragma solidity ^0.8.0;
    
    import "../Old.sol";
    import "../New.sol";
    
    contract BenchTest {
        OldHarness o = new OldHarness();
        NewHarness n = new NewHarness();
        bytes payload8  = hex"0102030405060708";
        bytes payload16 = hex"0102030405060708090a0b0c0d0e0f10";
    
        function test_old_ser64()   public view { o.ser64(0x0102030405060708); }
        function test_new_ser64()   public view { n.ser64(0x0102030405060708); }
        function test_old_deser64() public view { o.deser64(payload8); }
        function test_new_deser64() public view { n.deser64(payload8); }
        function test_old_ser128()   public view { o.ser128(0x0102030405060708090a0b0c0d0e0f10); }
        function test_new_ser128()   public view { n.ser128(0x0102030405060708090a0b0c0d0e0f10); }
        function test_old_deser128() public view { o.deser128(payload16); }
        function test_new_deser128() public view { n.deser128(payload16); }
    }
  4. Run forge test --gas-report and inspect the per-function gas
    table for OldHarness / NewHarness. Runtime-bytecode size of
    each harness comes from solc --via-ir --optimize --bin-runtime
    (or forge inspect <name> deployedBytecode).

Test Plan

  • cargo test -p serde-generate --features solidity --test integration_tests solidity
    — full solidity test suite, including:
    • test_uint64_endian_boundaries / test_uint128_endian_boundaries
      round-trip byte-distinct values (0, 1, 0xff, 0x100,
      0x0102…, type(uintN).max) to catch endian bugs in the swap
      formula.
    • test_uint_deserialize_truncated_input_reverts calls
      bcs_deserialize_offset_uint64 with 7 bytes and
      bcs_deserialize_offset_uint128 with 15 bytes and asserts that
      each reverts rather than returning a garbage value past the input.

@deuszx deuszx force-pushed the solidity-pr-4-uint64-128-mstore branch 2 times, most recently from 579dead to 1c4f2e9 Compare May 14, 2026 12:03
Replace the byte-by-byte little-endian construction loop with a
constant-time byte-swap chain followed by a single mstore/mload.
The swap chain reverses byte order (BCS is little-endian, EVM mstore
is big-endian); the assembly block then deposits the swapped value
at the top of a 32-byte word so mstore writes the BCS bytes at
result[0..N] in one operation. The deserialize path mirrors this with
mload + shr + the same swap chain.

The deserializers reintroduce the bounds check that the old
byte-by-byte path got for free from Solidity's `input[pos + i]`
indexing:

  require(pos + 8 <= input.length,  "uint64 deserialize: out of bounds");
  require(pos + 16 <= input.length, "uint128 deserialize: out of bounds");

Without these, mload would silently read up to 24 (resp. 16) bytes
past the end of `input` for short payloads and return a garbage
value. With them, the assembly read is also legal under the
`memory-safe` contract, since `bytes memory` data slots are allocated
rounded up to 32 bytes.

Each unrolled swap chain carries a comment noting that EVM has no
native byte-swap and describing what each term does, to make future
edits less error-prone.

Coverage:
* `test_uint64_endian_boundaries` / `test_uint128_endian_boundaries`
  round-trip byte-distinct values (0, 1, 0xff, 0x100, 0x0102…, max)
  to catch endian bugs in the swap formula.
* `test_uint_deserialize_truncated_input_reverts` calls
  `bcs_deserialize_offset_uint64` with 7 bytes and
  `bcs_deserialize_offset_uint128` with 15 bytes and asserts that
  each reverts rather than returning a garbage value past the input.
@deuszx deuszx force-pushed the solidity-pr-4-uint64-128-mstore branch from 1c4f2e9 to 338e7ff Compare May 14, 2026 12:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant