Skip to content

feat(storage): publish primitive storage API#1366

Draft
ukint-vs wants to merge 15 commits into
masterfrom
work/sails-vft-storage-profiling
Draft

feat(storage): publish primitive storage API#1366
ukint-vs wants to merge 15 commits into
masterfrom
work/sails-vft-storage-profiling

Conversation

@ukint-vs
Copy link
Copy Markdown
Member

@ukint-vs ukint-vs commented May 20, 2026

Summary

This PR narrows sails-storage to the first publishable public surface: reusable storage primitives and build helpers, not benchmark harnesses or token-contract-specific storage.

Storage primitives

  • Adds the sails-storage crate.
  • Publishes root-level FixedOpenAddressMap, StaticOpenAddressTable, StaticLayout, StaticActorIdU256Map, and StaticActorPairU256Map as the stable consumer-facing surface.
  • Documents the unusual static-memory and actor-map semantics in crate-level rustdoc with compiling examples.
  • Keeps token/VFT-specific and research-only storage shapes out of the stable public namespace.

Build integration

  • Adds sails_rs::build::StaticMemoryLayout and static-memory build helpers.
  • Exposes reserve_actor_u256_map and reserve_actor_pair_u256_map for generated static-memory constants.

Explicitly Not Included

  • No docs/ files.
  • No benchmark crates or generated benchmark clients.
  • No benchmark IDL files.
  • No example app changes.
  • No workflow or script changes.

Verification

  • cargo test -p sails-storage — 24 passed
  • cargo test -p sails-storage --doc — 3 passed
  • cargo test -p sails-rs --no-default-features --features wasm-builder build::tests — 9 passed, 14 filtered
  • git diff --check — passed for the current worktree diff
  • live PR diff checked for docs/, benchmarks/, examples/, scripts/, .github/, and .idl paths

Test plan

  • Storage public API tests cover root fixed/static map usage, actor map operations, pair map operations, and zero actor rejection.
  • Crate rustdoc examples cover fixed maps, generic static tables, and actor/U256 static maps.
  • Build helper tests cover generated static-memory layout constants.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the sails-storage crate, which provides allocator-light storage primitives, and adds extensive benchmarking support for storage and VFT operations, including new benchmark crates and analysis tools. It also refactors message handling in rs/gstd/macros.rs to use with_read_on_stack_or_heap for improved memory management. The reviewer identified a potential DoS vulnerability in the message tracker within the aggregator example, recommending a bounded-state pattern with explicit eviction to prevent gas exhaustion.

Comment thread examples/aggregator/app/src/lib.rs Outdated
#[export]
pub fn prepare_tracker(&mut self, len: u32) -> TrackerBenchResult {
let mut tracker = msg_tracker().borrow_mut();
tracker.clear();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

To prevent gas exhaustion and DoS, avoid patterns that can lead to panics when collections reach their limit. Instead of an assertion, ensure the message tracker uses a bounded-state pattern with explicit eviction of the oldest entries when the capacity (FIXED_TRACKER_CAPACITY) is reached. Additionally, include regression tests to verify the cap is enforced.

References
  1. Avoid storing unbounded collections in service state to prevent gas exhaustion and DoS. Use a bounded-state pattern, such as a fixed-size cap with explicit eviction of the oldest entries, and include regression tests to verify the cap is enforced.

@ukint-vs
Copy link
Copy Markdown
Member Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new sails-storage crate providing allocator-light storage primitives, including fixed-capacity and static-memory open-addressed maps. The review identified critical issues regarding unbounded collection growth in the static actor maps, which could lead to denial-of-service vulnerabilities, as well as platform-dependent hash and read operations that could cause cross-architecture inconsistencies. The reviewer provided actionable suggestions to improve hash distribution and ensure correct endianness handling.

Comment thread rs/storage/src/lib.rs Outdated
Comment on lines +821 to +831
&self,
actor: ActorId,
value: U256,
) -> Result<Option<U256>, TableError> {
let key = actor_key(actor);
reject_zero_key(&key)?;
if value.is_zero() {
return self
.remove_key(&key)
.map(|previous| previous.map(u256_from_value));
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The StaticActorIdU256Map implementation has a potential capacity leak. When a zero value is written to 'remove' an entry, the value is zeroed but the key remains in the slot. Because this map uses linear probing and identifies vacant slots only by an all-zero key, a slot that was once occupied can never be reused for a different key. This leads to unbounded growth of unique keys in the state, which can cause gas exhaustion or DoS. Per repository rules, use a bounded-state pattern (e.g., a fixed-size cap with explicit eviction) and include regression tests to verify the cap is enforced.

References
  1. Avoid storing unbounded collections in service state to prevent gas exhaustion and DoS. Use a bounded-state pattern, such as a fixed-size cap with explicit eviction of the oldest entries, and include regression tests to verify the cap is enforced.

Comment thread rs/storage/src/lib.rs Outdated
Comment on lines +1038 to +1042
if value.is_zero() {
return self
.remove_key(&left, &right)
.map(|previous| previous.map(u256_from_value));
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

Similar to StaticActorIdU256Map, StaticActorPairU256Map does not free slots when a zero value is written. This prevents slot reuse and leads to an unbounded collection of unique actor pairs in the service state, risking gas exhaustion and DoS. This should be addressed by implementing a bounded-state pattern with explicit eviction and regression tests, as required by repository rules.

References
  1. Avoid storing unbounded collections in service state to prevent gas exhaustion and DoS. Use a bounded-state pattern, such as a fixed-size cap with explicit eviction of the oldest entries, and include regression tests to verify the cap is enforced.

Comment thread rs/storage/src/lib.rs Outdated
Comment on lines +161 to +168
let mut hash = 0x811c_9dc5u32;

for byte in bytes {
hash ^= u32::from(*byte);
hash = hash.wrapping_mul(0x0100_0193);
}

hash as usize
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The hash_bytes function implements a 32-bit FNV-1a hash but returns a usize. On 64-bit platforms, the resulting hash will only occupy the lower 32 bits. If the table size (slots) exceeds 2^32, the hash will only ever map to the first 4GB of the table, leading to poor distribution and potential performance issues. For a general-purpose storage library, it is better to use the full width of usize for the hash calculation by using platform-appropriate constants.

    let mut hash = if core::mem::size_of::<usize>() == 8 {
        0xcbf2_9ce4_8422_2325usize
    } else {
        0x811c_9dc5usize
    };

    let prime = if core::mem::size_of::<usize>() == 8 {
        0x0000_0100_0000_01b3usize
    } else {
        0x0100_0193usize
    };

    for byte in bytes {
        hash ^= usize::from(*byte);
        hash = hash.wrapping_mul(prime);
    }

    hash

Comment thread rs/storage/src/lib.rs Outdated
Comment on lines +691 to +699
unsafe fn read_words_32(ptr: *const u8) -> [u64; 4] {
unsafe {
[
ptr::read_unaligned(ptr.cast::<u64>()),
ptr::read_unaligned(ptr.add(8).cast::<u64>()),
ptr::read_unaligned(ptr.add(16).cast::<u64>()),
ptr::read_unaligned(ptr.add(24).cast::<u64>()),
]
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The read_words_32 function uses ptr::read_unaligned to read u64 values directly from memory, which is platform-dependent as it uses the native endianness of the CPU. However, words_32 (used for comparison) uses u64::from_le_bytes. On a big-endian system, read_words_32 will return big-endian words while words_32 returns little-endian words, causing key comparisons to fail. To ensure platform-independent behavior and correctness on all architectures, these values should be explicitly converted from little-endian.

Suggested change
unsafe fn read_words_32(ptr: *const u8) -> [u64; 4] {
unsafe {
[
ptr::read_unaligned(ptr.cast::<u64>()),
ptr::read_unaligned(ptr.add(8).cast::<u64>()),
ptr::read_unaligned(ptr.add(16).cast::<u64>()),
ptr::read_unaligned(ptr.add(24).cast::<u64>()),
]
}
unsafe fn read_words_32(ptr: *const u8) -> [u64; 4] {
unsafe {
[
u64::from_le(ptr::read_unaligned(ptr.cast::<u64>())),
u64::from_le(ptr::read_unaligned(ptr.add(8).cast::<u64>())),
u64::from_le(ptr::read_unaligned(ptr.add(16).cast::<u64>())),
u64::from_le(ptr::read_unaligned(ptr.add(24).cast::<u64>())),
]
}
}

Comment thread rs/storage/src/lib.rs Outdated
Comment on lines +727 to +739
fn fold_words(bytes: &[u8; 32]) -> u32 {
let ptr = bytes.as_ptr();
unsafe {
ptr::read_unaligned(ptr.cast::<u32>())
^ ptr::read_unaligned(ptr.add(4).cast::<u32>())
^ ptr::read_unaligned(ptr.add(8).cast::<u32>())
^ ptr::read_unaligned(ptr.add(12).cast::<u32>())
^ ptr::read_unaligned(ptr.add(16).cast::<u32>())
^ ptr::read_unaligned(ptr.add(20).cast::<u32>())
^ ptr::read_unaligned(ptr.add(24).cast::<u32>())
^ ptr::read_unaligned(ptr.add(28).cast::<u32>())
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The fold_words function uses ptr::read_unaligned to read u32 values in native endianness. This makes the hash calculation platform-dependent. If program state is analyzed or shared across different architectures (e.g., a native tool reading WASM state), the hash distribution will differ, leading to incorrect lookups. It is safer to explicitly read these values as little-endian.

Suggested change
fn fold_words(bytes: &[u8; 32]) -> u32 {
let ptr = bytes.as_ptr();
unsafe {
ptr::read_unaligned(ptr.cast::<u32>())
^ ptr::read_unaligned(ptr.add(4).cast::<u32>())
^ ptr::read_unaligned(ptr.add(8).cast::<u32>())
^ ptr::read_unaligned(ptr.add(12).cast::<u32>())
^ ptr::read_unaligned(ptr.add(16).cast::<u32>())
^ ptr::read_unaligned(ptr.add(20).cast::<u32>())
^ ptr::read_unaligned(ptr.add(24).cast::<u32>())
^ ptr::read_unaligned(ptr.add(28).cast::<u32>())
}
}
fn fold_words(bytes: &[u8; 32]) -> u32 {
let ptr = bytes.as_ptr();
unsafe {
u32::from_le(ptr::read_unaligned(ptr.cast::<u32>()))
^ u32::from_le(ptr::read_unaligned(ptr.add(4).cast::<u32>()))
^ u32::from_le(ptr::read_unaligned(ptr.add(8).cast::<u32>()))
^ u32::from_le(ptr::read_unaligned(ptr.add(12).cast::<u32>()))
^ u32::from_le(ptr::read_unaligned(ptr.add(16).cast::<u32>()))
^ u32::from_le(ptr::read_unaligned(ptr.add(20).cast::<u32>()))
^ u32::from_le(ptr::read_unaligned(ptr.add(24).cast::<u32>()))
^ u32::from_le(ptr::read_unaligned(ptr.add(28).cast::<u32>()))
}
}

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the sails-storage crate, which provides allocator-light storage primitives such as fixed and static open-addressed maps, alongside build-script utilities for managing static memory layouts in WASM. Feedback identifies a critical resource exhaustion vulnerability in the static map's lookup logic where tombstone slots are not reused, potentially leading to capacity leaks. Additionally, improvements were suggested to ensure the Rust keyword validation list is comprehensive to prevent compilation errors in generated code and to optimize utility functions by reusing existing word-wise logic.

Comment thread rs/storage/src/lib.rs Outdated
Comment on lines +883 to +900
fn lookup(&self, key: &[u8; 32]) -> StaticLookup {
let key_words = words_32(key);
let mut index = static_map_index(hash_actor_key(key), LOG2_SLOTS);

for _ in 0..self.slots {
let stored_key = unsafe { read_words_32(self.key_ptr(index)) };
if stored_key == key_words {
return StaticLookup::Found(index);
}
if words_are_zero(stored_key) {
return StaticLookup::Vacant(index);
}

index = (index + 1) & self.mask;
}

StaticLookup::Full
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-critical critical

The current lookup implementation doesn't correctly handle tombstones (slots with a non-zero key but a zero value) for insertion. When a key is "removed" by setting its value to zero, the slot is not made available for new keys. This can lead to a situation where the map runs out of capacity for new insertions, even if many entries have been removed. This is a form of capacity leak and could lead to DoS via resource exhaustion. To fix this, the lookup function should be modified to identify and allow reuse of these tombstone slots for new insertions. Additionally, please include a regression test to verify that the capacity is correctly managed and the cap is enforced, as per the repository's bounded-state pattern rules.

    fn lookup(&self, key: &[u8; 32]) -> StaticLookup {
        let key_words = words_32(key);
        let mut index = static_map_index(hash_actor_key(key), LOG2_SLOTS);
        let mut first_tombstone = None;

        for _ in 0..self.slots {
            let stored_key_ptr = self.key_ptr(index);
            let stored_key = unsafe { read_words_32(stored_key_ptr) };

            if stored_key == key_words {
                return StaticLookup::Found(index);
            }

            if words_are_zero(stored_key) {
                return StaticLookup::Vacant(first_tombstone.unwrap_or(index));
            }

            if first_tombstone.is_none() {
                let value_ptr = self.value_ptr(index);
                if unsafe { words_at_zero(value_ptr) } {
                    first_tombstone = Some(index);
                }
            }

            index = (index + 1) & self.mask;
        }

        first_tombstone.map(StaticLookup::Vacant).unwrap_or(StaticLookup::Full)
    }
References
  1. Avoid storing unbounded collections in service state to prevent gas exhaustion and DoS. Use a bounded-state pattern, such as a fixed-size cap with explicit eviction of the oldest entries, and include regression tests to verify the cap is enforced.

Comment thread rs/src/build.rs
Comment thread rs/storage/src/lib.rs Outdated
Comment on lines +670 to +672
fn is_zero_32(bytes: &[u8; 32]) -> bool {
bytes.iter().all(|byte| *byte == 0)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The is_zero_32 function can be optimized by reusing the existing words_32 and words_are_zero functions. This would also improve consistency with other parts of the code that perform similar checks using word-wise operations instead of byte-wise iteration.

Suggested change
fn is_zero_32(bytes: &[u8; 32]) -> bool {
bytes.iter().all(|byte| *byte == 0)
}
fn is_zero_32(bytes: &[u8; 32]) -> bool {
words_are_zero(words_32(bytes))
}

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the sails-storage crate, which provides allocator-light storage primitives such as fixed-capacity and static-memory open-addressed maps, alongside build-script helpers for managing static memory layouts and patching WASM binaries. Feedback highlights a critical security concern regarding capacity leaks in the actor map remove_key implementation, which could lead to Denial of Service if not addressed with tombstones or cluster re-hashing. Additionally, suggestions were made to optimize the find_slot function for power-of-two capacities and to improve the robustness of the WASM patching logic to avoid fragility with future parser versions.

Comment thread rs/storage/src/lib.rs Outdated
Comment on lines +869 to +881
fn remove_key(&self, key: &[u8; 32]) -> Result<Option<[u8; 32]>, TableError> {
let StaticLookup::Found(index) = self.lookup(key) else {
return Ok(None);
};

let previous = self.visible_value(index);
if previous.is_some() {
unsafe {
self.write_value(index, &[0; 32]);
}
}
Ok(previous)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The remove_key implementation for StaticActorIdU256Map (and similarly for StaticActorPairU256Map) zeros the value but leaves the key in the slot. In an open-addressed map using linear probing without tombstones, this results in a capacity leak. Slots occupied by 'removed' keys will never be considered Vacant for new keys, eventually causing CapacityOverflow and potential DoS. To prevent gas exhaustion and ensure correct state management, implement a bounded-state pattern (e.g., using tombstones or cluster re-hashing). Additionally, include regression tests to verify that the capacity is correctly managed and the cap is enforced.

References
  1. Avoid storing unbounded collections in service state to prevent gas exhaustion and DoS. Use a bounded-state pattern, such as a fixed-size cap with explicit eviction of the oldest entries, and include regression tests to verify the cap is enforced.

Comment thread rs/storage/src/lib.rs
Comment thread rs/src/build.rs Outdated
Comment on lines +423 to +434
if let Some((id, range)) = other.as_section() {
module.section(&wasm_encoder::RawSection {
id,
data: &wasm[range],
});
} else if !matches!(
other,
wasmparser::Payload::CodeSectionStart { .. }
| wasmparser::Payload::CodeSectionEntry { .. }
) {
return Err(Error::UnsupportedWasmPayload);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The patch_imported_memory function uses an aggressive UnsupportedWasmPayload error for any payload that isn't a section or a code section entry. While this works for simple modules, it might fail on valid WebAssembly modules containing other non-section payloads yielded by wasmparser (e.g., if future versions yield data segments item-by-item similar to code entries). More importantly, the current logic relies on as_section() returning the full range for CodeSectionStart. While this is true for current wasmparser versions, it's a fragile pattern for round-tripping. A more robust approach would be to explicitly handle all core WASM section types or use a higher-level re-encoding library if possible.

@ukint-vs ukint-vs force-pushed the work/sails-vft-storage-profiling branch from b570541 to 16f1e21 Compare May 20, 2026 17:19
@ukint-vs ukint-vs marked this pull request as draft May 22, 2026 12:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant