-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Goal
Expand the Evolve simulator into a comprehensive testing and validation tool that runs in CI on every PR. The finished state: CI catches non-determinism, fuzzes critical paths, and tracks performance regressions automatically.
Current State
- Simulator (
evolve_simulator): Seed-based determinism, fault injection, time simulation, basic metrics/reporting. Used in testapp integration tests. - Fuzzing: Limited to tx encoding (
fuzz_decode,fuzz_roundtrip,fuzz_structured) incrates/app/tx/fuzz/. Requirescargo +nightly fuzz— not integrated into CI. - CI (
rust.yml): Runscargo test --workspaceon every PR. Long simulation tests exist but are manual-only (workflow_dispatch). - No non-determinism detection: No automated check that the same seed produces identical state across runs.
Scope
1. Non-Determinism Detection
- Dual-execution oracle: Run the same simulation seed twice and assert identical state hashes at every block. This is the core invariant — if it ever fails, consensus is broken.
- Cross-platform determinism check: Ensure state hashes match between macOS and Linux (CI runs Linux, devs run macOS). May require pinning float ops or auditing platform-dependent behavior.
- Iteration order audit: Automated check that no
HashMap/HashSetusage leaks into STF execution paths (beyond the existing clippy lint — runtime verification). - Time source audit: Verify no
SystemTime/Instantusage reaches STF execution. Simulator'sSimulatedTimeshould be the only time source during execution.
2. Expanded Fuzzing
- STF fuzzing: Fuzz the full
apply_blockpath with randomly generated blocks (random tx ordering, random payloads, malformed inputs). Assert no panics, no state corruption. - Storage layer fuzzing: Fuzz the storage backend with random key/value operations. Verify state hash consistency after commit.
- Account execution fuzzing: Generate random sequences of exec/query calls against accounts. Verify error handling (no panics, proper error codes).
- Mempool fuzzing: Fuzz transaction insertion, eviction, and ordering under concurrent load.
- Corpus management: Maintain and expand fuzz corpus with interesting inputs found during runs. Store corpus in CI cache.
3. Performance Testing & Regression Detection
- Benchmark baseline: Establish criterion benchmarks for key paths (block execution, tx processing, storage read/write, state hashing).
- Simulator performance report: Extend
PerformanceReportwith p50/p95/p99 latencies, throughput (tx/s, blocks/s), and memory high-water mark. - Regression detection: Compare benchmark results against a baseline (stored as artifact or in-repo). Fail CI or post a warning comment if performance degrades beyond threshold (e.g., >10%).
- Stress test profile: Standardized stress test config (high block count, high tx volume, fault injection) that runs on a schedule (nightly or weekly).
4. CI Integration
- Simulation tests on every PR: Run a short simulation suite (e.g., 100 blocks, 3 seeds) as part of the standard test job. Must complete in <5 min.
- Non-determinism check on every PR: Dual-execution with at least 2 seeds. Fast — just comparing hashes.
- Nightly fuzzing job: Run cargo-fuzz (or bolero/proptest long runs) for extended duration (30–60 min). Report findings as issues.
- Nightly performance run: Run benchmarks + long simulation, store results as artifacts, compare against baseline.
- Seed rotation: Each CI run uses a mix of fixed seeds (regression) and random seeds (exploration). Failed random seeds are logged for reproduction.
- Failure reproduction: On any simulation failure, CI output includes the exact
just sim-seed <seed>command to reproduce locally.
5. Simulator Enhancements
- Transaction generators: Configurable random transaction generators for simulation (valid txs, invalid txs, edge-case txs). Currently manual — should be built into the simulator.
- Scenario DSL or config: Define simulation scenarios (e.g., "normal load for 50 blocks, then spike to 10x, then fault injection") as config rather than code.
- Shrinking on failure: When a simulation fails, automatically try to find the minimal reproducing seed/block sequence (inspired by proptest shrinking).
- Coverage tracking: Integrate with coverage tools to measure what % of STF/account code paths are exercised by simulation.
Success Criteria
- Every PR runs simulation tests (short suite, <5 min) + non-determinism dual-execution check.
- Nightly CI job fuzzes STF, storage, and mempool for 30+ min and files issues on findings.
- Performance benchmarks run nightly with regression detection — degradations >10% are flagged.
- A failing simulation always prints its reproduction command.
- Zero known non-determinism sources in the STF execution path.
Implementation Notes
- Start with non-determinism detection (highest value, lowest effort) — dual-execution is just "run twice, compare hashes."
- For fuzzing, consider
boleroas it works with both libfuzzer and proptest backends, avoiding the nightly-onlycargo-fuzzlimitation. - Performance baselines can use GitHub Actions artifacts or
git notesfor storage. - Keep CI wall time in check — simulation and fuzzing are useless if they make PRs slow. Short suite on PR, long suite on nightly.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request