solidity: bulk memcpy for bcs_deserialize_offset_{bytes,string}#96
Draft
deuszx wants to merge 1 commit into
Draft
solidity: bulk memcpy for bcs_deserialize_offset_{bytes,string}#96deuszx wants to merge 1 commit into
deuszx wants to merge 1 commit into
Conversation
Add EvmVersion config (Shanghai default / Cancun / Latest) to
CodeGeneratorConfig and route it through to the Solidity backend. The
byte-by-byte copy loop in bcs_deserialize_offset_bytes and the
corresponding loop in bcs_deserialize_offset_string are now replaced
with either:
* Cancun / Latest: a single `MCOPY` (EIP-5656).
* Shanghai: a word-by-word assembly memcpy (mload/mstore in 32-byte
chunks). The trailing partial word writes into padding inside the
`new bytes(len)` allocation (which rounds up to 32 bytes), so the
write stays within bounds.
The identity precompile (0x04) was the original target for Shanghai
but `staticcall` makes the enclosing function non-pure. Solidity then
rejects every `bcs_deserialize_*` declared `pure`. The word-loop keeps
the existing `pure` API and is still far cheaper than the byte loop.
Both paths are gated on `len > 0` so empty copies skip the assembly
block entirely.
Cover both paths with round-trip tests at lengths 0, 1, 31, 32, 33,
and 1024 (boundaries where the word-loop and MCOPY diverge from a
naive byte loop).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add
EvmVersionconfig (Shanghaidefault /Cancun/Latest) toCodeGeneratorConfigand route it through to the Solidity backend.The byte-by-byte copy loop in
bcs_deserialize_offset_bytesand thecorresponding loop in
bcs_deserialize_offset_stringare now replacedwith one of:
MCOPY(EIP-5656).mload/mstorein32-byte chunks). The trailing partial word writes into padding inside
the
new bytes(len)allocation (which rounds up to 32 bytes), so thewrite stays within bounds.
The identity precompile (
0x04) was the original target for Shanghaibut
staticcallmakes the enclosing function non-pure. Solidity thenrejects every
bcs_deserialize_*declaredpure. The word-loop keepsthe existing
pureAPI and is still far cheaper than the byte loop.Both paths are gated on
len > 0so empty copies skip the assemblyblock entirely.
Benchmarks
Measured with
forge test --gas-report,via_ir = true,optimizer_runs = 200,evm_version = "cancun"(so MCOPY isavailable),
solc 0.8.33. Numbers are total transaction gas for anexternal call to a harness that delegates to the library; same payload
prep across forms, so the relative Δ between columns is the
copy-implementation cost.
Per-byte cost (1 024-byte payload, divide by
len):So the new forms are ~2.7× cheaper per byte than the byte-by-byte
loop. Shanghai and Cancun are within ~1 % of each other — the word
loop is essentially as fast as the native
MCOPYopcode at typicalsizes, which justifies keeping it as the default and only switching
on
MCOPYwhen the user explicitly opts into Cancun.Empty (
len = 0) and 1-byte payloads are a wash (within ±300 gas ofthe old form). The
len > 0guard prevents the assembly setup costfrom hurting the empty case.
Deployed bytecode of the harness contract (which inlines the library
helpers via
via_ir):Both new forms are smaller than the old byte-by-byte loop, despite
the inline assembly. Yul lays out the assembly memcpy more compactly
than the indexed byte-by-byte loop with its per-iteration bounds check.
Practical impact for
linera-bridge: everybytes/stringfieldinside a certificate decodes ~280 gas per byte cheaper. A 2 KiB
certificate saves ~570 K gas just on the copy side, before counting
field-by-field decode savings.
Reproduce the benchmark
Save the three libraries below as
Old.sol,Shanghai.sol, andCancun.sol. Each file exports a small harness contract whosedeser(bytes calldata)method delegates to the library so foundrycan measure it.
Create
foundry.toml:Save the test harness as
test/Bench.t.sol(builds anLEB128-prefixed payload of the requested length, fills bytes with
i & 0xff, then calls each harness).Run
forge test --gas-report. The per-function gas table for eachharness shows the per-call cost; the
test_{old,sha,can}_<len>results show the per-length breakdown. Runtime-bytecode size comes
from
solc --via-ir --optimize --bin-runtime --evm-version cancun(or
forge inspect <name> deployedBytecode).Test Plan
Cover both paths with round-trip tests at lengths
0, 1, 31, 32, 33, 1024— boundaries where the word loop andMCOPYdiverge from anaive byte loop:
test_bytes_copy_shanghaiexercises the word-loop path.test_bytes_copy_cancunexercises the MCOPY path.