Expand the current Validation Test Suite (VTS) so it can detect recently discovered TLC soundness bugs that may cause TLC to miss a real violation of a safety property (most severe), or report a bogus counterexample (less severe, but still qualification-relevant). These bugs are subtle and are not exposed by the current VTS strategy, which combines TLA+ language features pairwise. In practice, soundness bugs appear to require interactions among 3, 4, or more features. Exhaustively testing such higher-order combinations is generally infeasible due to combinatorial explosion.
This means the current VTS approach, while useful, is insufficient on its own for covering certain soundness-critical TLC behaviors relevant to safety-critical qualification (e.g., ISO 26262).
Critical soundness issues to drive coverage expansion:
Required work
Characterize the bug-triggering interactions
- Identify which combinations of language features / semantic conditions are implicated in each issue.
- Document why pairwise composition is insufficient in each case.
Design targeted, non-exhaustive test strategies
- Develop test-generation strategies that go beyond pairwise coverage without requiring full n-wise enumeration.
- Candidate approaches may include:
- bug-class regression tests
- targeted higher-order feature combinations
- metamorphic tests
- constraint-guided test generation
- risk-based prioritization of combinations
Promote critical issues into permanent regression coverage
- Add minimized reproducer specs (where possible) to VTS.
- Ensure each reproducer is tied to a documented bug class and expected behavior.
Update qualification claims and coverage rationale
- Explicitly describe what kinds of soundness risks are now covered.
- Document residual risks from untested higher-order feature interactions.
Outcome needed: a VTS expansion plan that prioritizes detection of soundness-critical TLC failures beyond pairwise feature interaction testing, starting from the known issue set above.
Expand the current Validation Test Suite (VTS) so it can detect recently discovered TLC soundness bugs that may cause TLC to miss a real violation of a safety property (most severe), or report a bogus counterexample (less severe, but still qualification-relevant). These bugs are subtle and are not exposed by the current VTS strategy, which combines TLA+ language features pairwise. In practice, soundness bugs appear to require interactions among 3, 4, or more features. Exhaustively testing such higher-order combinations is generally infeasible due to combinatorial explosion.
This means the current VTS approach, while useful, is insufficient on its own for covering certain soundness-critical TLC behaviors relevant to safety-critical qualification (e.g., ISO 26262).
Critical soundness issues to drive coverage expansion:
FcnLambdaValuecauses invalidEXCEPTsemantics, violating soundness and completeness tlaplus#1302FcnLambdaValue.toTuple()ignoresthis.exceptstlaplus#1145Required work
Characterize the bug-triggering interactions
Design targeted, non-exhaustive test strategies
Promote critical issues into permanent regression coverage
Update qualification claims and coverage rationale
Outcome needed: a VTS expansion plan that prioritizes detection of soundness-critical TLC failures beyond pairwise feature interaction testing, starting from the known issue set above.