test: Parquet merge pipeline verification suite#6369
Open
Conversation
Property test verifies that no merge task ever contains a split that should have been filtered: mature by ops (>= max_merge_ops), mature by size (>= target), time-matured (created_at + maturation_period <= now), or missing a window. Generates random splits across the maturity boundary and tests at the actor level. Also makes test helpers pub(super) so sibling test modules (sketch, crash, multi-round) can reuse them. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Verifies that sketch splits dispatch to the correct metastore RPCs (stage_sketch_splits, publish_sketch_splits) and that the merged output has ParquetSplitKind::Sketches with correct metadata. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Crash/restart test: - Injects publish failure on 2nd call → pipeline detects failure, kills actors, respawns - Verifies list_metrics_splits called on respawn (re-seeding) - Verifies pipeline generation >= 2 (respawn occurred) - Verifies original splits eventually replaced (no data loss) Multi-round merge test: - 4 input splits → 2 round-1 merges → 1 round-2 merge - Verifies num_merge_ops=2 on final output - Verifies all original + intermediate splits replaced - Verifies MC-1: total rows preserved across lifecycle Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Models the two-phase shutdown: DisconnectMergePlanner breaks feedback, RunFinalizeMergePolicyAndQuit drains cold windows, then in-flight merges complete. Safety invariants: - NoSplitLoss: every merge input is published or in-flight - NoDuplicateMerge: no split in multiple concurrent merges - FinalizeWithinBound: at most MaxFinalizeOps finalize operations - ShutdownOnlyWhenDrained: shutdown requires empty in-flight set Liveness: - ShutdownEventuallyCompletes under weak fairness Two configs: _small (hundreds of states) and full (larger space). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Model checks 5 properties across ingest, merge, crash, and restart: - MC-1 lifecycle: total rows preserved (no loss/duplication) - Bounded WA: merge_ops never exceeds max_merge_ops - No duplicate merge: no split in multiple concurrent merges - No orphan after restart: all immature splits re-seeded - MP-1: level homogeneity (by construction) Small model (~instant), full model (~seconds, gated behind #[ignore]). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Document the actual tla2tools.jar path from brew cask tla+-toolbox - Add "Run All Specs" one-liner for quick verification - Add spec catalog with state counts and invariant names - Remove stale references to brew formula (doesn't exist) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CI clippy lint (`clippy::question-mark`, -D warnings) rejected the explicit match-and-return-None pattern. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ifetime Phase A of the verification-gap closure (PR #6369). The original spec only modeled orderly two-phase shutdown — no Crash action, no Restart, no row-conservation claim. The four invariants were either trivially true by construction or static set-membership, so the spec proved nothing about the failure modes that motivated it. This rewrite: - Adds Crash and Restart actions. Crash invalidates in-memory state at any point in the pipeline; uploaded-but-unpublished merge outputs become orphan_outputs. Restart re-seeds cold_windows from durable published_splits (models fetch_immature_splits). - Splits CompleteMerge into UploadMergeOutput then PublishMergeAndFeedback so a Crash between the two phases is reachable. This is what catches the "leaked but invisible" failure mode — output blob in S3 with no metastore reference. The model proves these are object-store-only garbage (LeakIsObjectStoreOnly), not durable data loss. - Adds split_rows / split_merge_ops / split_window ghost functions so RowsConserved expresses the strong "no data loss, no duplication" property: total ingested rows equal sum of rows in published_splits in every reachable state. - Adds NoOrphanWhenConnected (state invariant), RestartReSeedsAllImmature (action property) and NoPersistentOrphan (liveness leads-to). Together these capture: while connected the planner sees every immature split; every Restart correctly recovers all immature splits; in any run with restart budget remaining, orphans always eventually clear. - Extends Restart to fire after graceful shutdown too (multi-lifetime), so the cross-process recovery claim is explicit. Bounded by MaxRestarts. The replaced NoOrphanAfterRestart invariant fired during initial TLC runs — the failing trace was a *legitimate* state in production (publish during shutdown disconnect leaves the output untracked by the planner until the next process invocation). The fix wasn't to remove the invariant but to split the safety claim from the recovery claim: NoOrphanWhenConnected for the steady state, RestartReSeedsAllImmature for the recovery transition, NoPersistentOrphan for cross-lifetime liveness. Recorded as a sesh-mode rule: never silently weaken an invariant that produced a counter-example. Single config drops the small/full split — full config now runs in ~12s on a workstation (217,854 distinct states, 11 invariants + 3 temporal properties verified). The states/ directory dropped by TLC is now gitignored. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The primary `MergePipelineShutdown.cfg` is now optimized for multi-lifetime exercise (MaxIngests=3, MaxRestarts=2) — 15,732 states in ~1s — covering the cross-process recovery claim added in the previous commit. The new `MergePipelineShutdown_chains.cfg` keeps MaxIngests=4 with MaxRestarts=1 for deeper merge-chain coverage (217,854 states, ~10s): exercises level-0 → level-1 → level-2-mature compaction so BoundedWriteAmp is checked across the full chain and concurrent in-flight merges interact. Combined run is ~11s. Both share the same invariant + property set; only the constants differ. The TLA+ run-all loop in CLAUDE.md now globs every .cfg and resolves the matching .tla, supporting any number of configs per spec. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ariants Phase B of the verification-gap closure (PR #6369). Eliminates drift between the merge-pipeline model and any production-side checks by making them evaluate the *same* Rust functions on the *same* state. quickwit-dst/src/invariants/merge_pipeline.rs (new): - MergePipelineState struct mirroring the TLA+ VARIABLES block (planner_alive, in_flight_merges, cold_windows, published_splits, splits, orphan_outputs, lifecycle counters, total_ingested_rows ghost) - Pure-function predicates corresponding 1:1 to TLA+ invariants: rows_conserved, bounded_write_amp, no_split_loss, no_duplicate_merge, no_orphan_in_planner, no_orphan_when_connected, leak_is_object_store_only, mp1_level_homogeneity, restart_re_seeds_all_immature - Helper orphan_set mirrors the TLA+ OrphanSet operator - 17 unit tests covering passing/failing paths for each predicate quickwit-dst/src/invariants/registry.rs: - Adds MP4..MP11 invariant IDs (preserved short-code naming convention) - Each maps to a TLA+ invariant from MergePipelineShutdown.tla quickwit-dst/src/models/merge_pipeline.rs: - State type is now invariants::merge_pipeline::MergePipelineState (literal sharing — no parallel definition, no conversion layer) - CompleteMerge split into UploadMergeOutput + PublishMergeAndFeedback to match the TLA+ multi-phase merge completion. A Crash between phases orphans the upload (orphan_outputs) without losing data. - Restart re-seeds cold_windows from durable published_splits (mirrors fetch_immature_splits) and resets shutdown_complete / finalize state, modelling a fresh process invocation. Bounded by max_restarts. - DrainComplete added so graceful-shutdown is a distinct terminal action (matching TLA+). - properties() calls the shared predicate functions instead of inline closures — model and runtime checks evaluate identical Rust code. - Three configs: small (fast iteration), multi_lifetime (3 lifetimes, matches MergePipelineShutdown.cfg), deep_chains (level 0→1→2 mature, matches MergePipelineShutdown_chains.cfg). - Restart's next_state debug_asserts MP-11 (restart_re_seeds_all_immature) on its post-state, encoding the action property as a runtime check. All 50 tests pass under `cargo nextest run -p quickwit-dst --features model-checking` (33 invariant unit tests + 4 merge-pipeline model configs + 13 other models). Multi-lifetime checks in 0.1s, deep-chains in 0.6s. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase C of the verification-gap closure (PR #6369). Production code now evaluates the *same* invariant predicates that the Stateright model and TLA+ spec verify, with results forwarded to the existing DogStatsD recorder (already wired in quickwit-cli/src/logger.rs). Every check emits `pomsky.invariant.checked{invariant=MP-X}` and, on failure, `pomsky.invariant.violated{invariant=MP-X}` — usable directly as Datadog alert criteria. ParquetMergeExecutor (post-merge, pre-uploader): - MP-2 (HasMinimumSplits): merge has >= 2 inputs - MP-1 (LevelHomogeneity): all inputs share num_merge_ops level - MP-3 (ScopeCompatibility): all inputs share sort_fields and window - MP-4 (RowsConserved): sum of input num_rows == sum of output num_rows. Empty-output path checks input rows are all zero (otherwise data was silently dropped). The non-empty path checks the strong row-preservation property — the same MC-1 / RowsConserved invariant from the TLA+ spec. ParquetMergePlanner::new (post-restart re-seed): - MP-11 (RestartReSeedsAllImmature): every split the merge policy classifies as still-immature (and which has a window for compaction) is recorded in scoped_young_splits and known_split_ids. Filters out mature splits, time-expired splits, and pre-Phase-31 splits before comparing — these are intentional drops, not the failure mode MP-11 protects against. Cargo.toml: quickwit-indexing now depends on quickwit-dst for the shared invariant module and check_invariant! macro. Also fixes a pre-existing rustdoc warning: the parquet_merge_planner docs referenced `[`MergePlanner`]` as an intra-doc link, but the type lives in another crate and can't resolve. Changed to plain backticks. All 52 metrics_pipeline tests pass; clippy + nightly fmt + machete + cargo doc clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… model Phase D of the verification-gap closure (PR #6369). The TLA+ spec and Stateright model are now backed by *production code that actually emits events at every modeled transition*, and a test replays those events through the same predicates the model verifies. A divergence between production and the model surfaces here as a predicate failure on real production state. This closes the loop: TLA+ → Stateright (Phase A+B) → production checks (Phase C) → trace conformance (Phase D). quickwit-dst/src/events/merge_pipeline.rs (new): - MergePipelineEvent enum with one variant per modeled action, using production types (String split IDs, Range<i64> windows). Notably no Crash variant — process death cannot emit anything; crashes are inferred during replay by the absence of events between actions and a subsequent Restart. - Pluggable observer (fn pointer) following the existing set_invariant_recorder pattern. No-op single-atomic-load when no observer is installed; production hot path stays cheap. Production emission sites (always-on, no feature flag): - ParquetMergePipeline::spawn_pipeline → Restart, after fetch_immature_splits - ParquetMergePipeline::FinishPendingMergesAndShutdownPipeline → DisconnectMergePlanner and RunFinalizeAndQuit - ParquetMergePlanner::send_merge_ops → PlanMerge - ParquetUploader::handle (post-upload, pre-publish) → UploadMergeOutput for merge outputs - Publisher::handle (Parquet publish path) → IngestSplit (when no replaced_split_ids) or PublishMergeAndFeedback (when replacing) quickwit-indexing/.../parquet_merge_pipeline_trace_conformance_test.rs (new): - Two scenarios: 1. Normal happy path: 4 splits → 2 merges → 1 mature output. Replay verifies MP-1, MP-4..MP-10 hold at every state. ~0.6s. 2. Crash mid-cascade: publish failure injected on the 2nd merge forces respawn. Trace covers Restart → PlanMerge → Upload → crash → Restart → re-seed → PlanMerge → Upload → Publish. ~1.1s. - StateMirror reconstructs MergePipelineState from events using a production-id → model-u32 interner, applies the same predicate functions used by the Stateright model. - The mock metastore tracks `staged` and `published` separately so list_metrics_splits returns only what real metastores would (not staged-but-failed). Earlier test-side bug here would have masked divergences; the trace test caught it immediately. What this catches that prior tests do not: - The crash test only verified re-seeding *happened*. The trace test verifies that the post-restart state preserves all formal safety invariants (no orphans the planner can never reclaim, no row drift, no level mixing on re-merge, etc.). - A divergence between production order/atomicity and the model's atomic actions surfaces as either a state-mirror error ("event can't be applied") or a predicate failure with the offending event in the panic message. Both tests pass with the current production code: production behavior matches the TLA+ spec for the scenarios exercised. The test infrastructure is now ready to surface real divergences if they exist, and to be extended with adversarial scenarios (multiple back-to-back crashes, finalize-during-crash, concurrency stress). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CS-3 ("splits before compaction_start_time are never compacted") is
defined in the TLA+ model and the InvariantId registry, but the
production `ParquetMergePolicyConfig` does not yet expose a
`compaction_start_time` field. Adding a runtime check now would have
nothing to filter against.
Records the gap as a planner-level TODO so the check lands alongside
the feature when it's implemented (filter splits in
`record_splits_if_necessary` + verify via `check_invariant!`).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
End-to-end verification stack for the Parquet merge pipeline. Closes the
loop from formal spec → exhaustive model → production runtime checks →
trace-conformance proof, so a divergence between production and the
verified model surfaces as a test failure or a Datadog metric, not a
silent corruption.
Verification pyramid (single source of truth)
The Stateright model and production code evaluate the same Rust
predicate functions on the same state struct. Drift impossible by
construction.
What landed
TLA+ —
MergePipelineShutdown.tlaExtended from the original orderly-shutdown spec to cover crash and
restart at every transition. CompleteMerge split into
UploadMergeOutputPublishMergeAndFeedbackso a crash between upload and metastorereplace is reachable (orphan_outputs failure mode). Multi-lifetime
support (Restart can fire after graceful shutdown, modeling fresh
process invocations). New invariants:
RowsConserved(MP-4): sum of rows in published splits = total ingested rowsLeakIsObjectStoreOnly(MP-10): orphans never become durable dataBoundedWriteAmp(MP-5),MP1_LevelHomogeneity,NoOrphanInPlanner(MP-8),NoOrphanWhenConnected(MP-9)RestartReSeedsAllImmature(MP-11): every Restart re-seeds correctlyNoPersistentOrphan: orphans always eventually clearTwo TLC configs: primary (multi-lifetime, ~16K states, 1s) and chains
(deep merge chain, ~218K states, 12s). 11 invariants + 3 temporal
properties verified.
Shared invariants —
quickwit-dst/src/invariants/merge_pipeline.rsNew
MergePipelineStatestruct mirrors the TLA+ VARIABLES blockliterally; used directly by the Stateright model (no parallel
definition). 9 pure-function predicates (MP-1, MP-4..MP-11) with unit
tests for passing and failing paths.
Stateright —
quickwit-dst/src/models/merge_pipeline.rsRefactored to use the shared state and predicates. Mirrors the TLA+
multi-phase merge and multi-lifetime semantics. Three configs (small,
multi_lifetime, deep_chains) all pass.
Production runtime checks —
quickwit-indexing/.../metrics_pipeline/Eight new
check_invariant!call sites, all forwarding to the existingDogStatsD recorder (
pomsky.invariant.{checked,violated}{invariant=MP-X}):ParquetMergeExecutor— MP-1 (level homogeneity), MP-2 (≥2 inputs), MP-3 (scope), MP-4 (input rows == output rows)ParquetMergePlanner::new— MP-11 (post-restart re-seed completeness, the formal "fetch_immature_splits is correct" claim)Trace conformance —
parquet_merge_pipeline_trace_conformance_test.rsProduction emits a
MergePipelineEventstream at every modeledtransition (
Restart,PlanMerge,UploadMergeOutput,PublishMergeAndFeedback,DisconnectMergePlanner,RunFinalizeAndQuit, etc.). The test replays the captured eventsthrough the same predicates Stateright verifies. Two scenarios pass:
cleanly across the crash boundary; rows conserved end-to-end
A divergence between production and the verified model would surface
here as a predicate failure with the offending event in the panic
message.
Original verification suite (still here)
Property test, sketch integration test, crash/restart integration test,
multi-round merge integration test — unchanged.
What's deferred
production
ParquetMergePolicyConfigdoesn't exposecompaction_start_timeyet. TODO inparquet_merge_planner::record_splits_if_necessaryso the check landsalongside the feature.
trace test; would need a periodic background task to materialize the
full
MergePipelineStatefor runtime evaluation. Not added becausethe trace-conformance proof already covers them.
Coverage table (23 invariants in registry)
check_invariant!Sesh-mode rule
Added a strong rule to
.claude/skills/sesh-mode/SKILL.md: when averification check fails, never silently weaken or remove the invariant.
Almost always either (a) the implementation has a real bug, or (b) the
property is over-strong — and the failing trace is revealing the
weaker safety property the design does guarantee.
Test plan
MergePipelineShutdown.cfgpasses (~16K states, 1s);_chains.cfgpasses (~218K states, 12s); 11 invariants + 3 temporal propertiescargo nextest run -p quickwit-dst --features model-checking— 49 tests pass + 1 ignored deep_chains passes when runcargo nextest run -p quickwit-indexing --features metrics metrics_pipeline— 54 tests pass (52 prior + 2 new trace conformance)-Dwarnings)cargo doc --no-deps -p quickwit-indexing --features metricscleancargo macheteclean🤖 Generated with Claude Code