mutflow - Design Document

Overview

mutflow is a Kotlin compiler plugin for lightweight, low-overhead mutation testing. It targets developers and teams who currently do no mutation testing due to the high cost and complexity of traditional tools.

The Problem

Traditional mutation testing (e.g., Pitest) works by:

Generating mutants (modified versions of code)
Compiling each mutant separately
Running tests against each mutant
Reporting which mutants survived

This is thorough but expensive: many compilation cycles, long execution times, complex tooling setup. Most teams skip mutation testing entirely.

The Approach: Mutant Schemata

mutflow uses the "mutant schemata" (or "meta-mutant") technique:

Compile once: The compiler plugin injects ALL mutation variants into the code at compile time, guarded by conditional switches
Runtime selection: At test runtime, a control mechanism activates exactly one mutation per run
Multiple runs: Tests execute multiple times - once as baseline, then with different single mutations
Fail on survivors: If a mutant survives (tests pass when they shouldn't), the test fails with actionable feedback

Example Transformation

Test code:

@MutFlowTest
class CalculatorTest {
    @Test
    fun testIsPositive() {
        val result = MutFlow.underTest {  // parameterless with JUnit extension
            isPositive(5)
        }
        assertTrue(result)
    }
}

Production code - before:

@MutationTarget
class Calculator {
    fun isPositive(x: Int): Boolean {
        return x > 0
    }
}

Production code - after compiler plugin:

@MutationTarget
class Calculator {
    fun isPositive(x: Int): Boolean {
        // Compiler injects nested when expressions for multiple mutation types
        return when (MutationRegistry.check(
            pointId = "sample.Calculator_0",
            variantCount = 2,
            sourceLocation = "Calculator.kt:4",
            originalOperator = ">",
            variantOperators = ">=,<",
            occurrenceOnLine = 1
        )) {
            0 -> x >= 0  // operator mutation: include equality
            1 -> x < 0   // operator mutation: direction flip
            else -> when (MutationRegistry.check(
                pointId = "sample.Calculator_1",
                variantCount = 2,
                sourceLocation = "Calculator.kt:4",
                originalOperator = "0",
                variantOperators = "1,-1",
                occurrenceOnLine = 1
            )) {
                0 -> x > 1   // constant mutation: increment
                1 -> x > -1  // constant mutation: decrement
                else -> x > 0  // original
            }
        }
    }
}

This nested structure is generated recursively by the compiler plugin. Each matching MutationOperator wraps the expression, with the else branch feeding into the next operator. Since only one mutation is active at runtime, there's no complexity - the active mutation's branch executes, all others fall through to original.

Runtime Discovery Model

Mutation points are discovered dynamically at runtime, not statically at class load:

Discovery run: Code executes normally (no activeMutation). Each MutationRegistry.check() call registers "I exist with these variants" along with display metadata (source location, operator descriptions), and returns null (use original). After execution, the registry returns: "discovered 5 mutation points with their variant counts".

Note: Point IDs use the format ClassName_N (e.g., sample.Calculator_0), but display names show source location and operator (e.g., (Calculator.kt:7) > → >=). When the same operator appears multiple times on the same line (e.g., if (a > b && c > d)), an occurrence suffix disambiguates: the first stays > → >=, the second becomes > → >= #2. A future improvement is switching to IR-hash based IDs for stability across refactoring.

Mutation runs: The caller specifies which mutation to activate via ActiveMutation(pointId, variantIndex). When that point calls check(), it returns the active variant index instead of null.

This dynamic discovery matters because:

Different underTest blocks exercise different code paths
Only mutations actually reached by the test are counted
Same class called from different tests may hit different mutation points

Key Features

1. Explicit Test Scoping with `MutFlow.underTest`

Tests explicitly mark the action under test using BDD-style structure:

@MutFlowTest
class CalculatorTest {
    @Test
    fun testIsPositive() {
        // given
        val x = 5

        // when
        val result = MutFlow.underTest {  // parameterless when using @MutFlowTest
            isPositive(x)
        }

        // then
        assertTrue(result)
    }
}

The MutFlow.underTest block:

Wraps only the action under test (the "when" in given/when/then)
Returns the result for assertions outside the block
Assertions stay outside - they should fail when mutations change behavior
When using @MutFlowTest, the JUnit extension manages session lifecycle internally

2. Global Baseline and Run Model

Mutation testing operates at the test class level with a global registry:

Run 0 (baseline): ALL test cases in the class execute first, discovering mutation points
Run 1+: ALL test cases execute with the same mutation active

// With @MutFlowTest, the JUnit extension orchestrates all runs automatically:
// - Run 0 (baseline): All tests execute, mutation points discovered
// - Run 1+: All tests execute with same mutation active

@MutFlowTest
class CalculatorTest {
    @Test fun testIsPositive() {
        val result = MutFlow.underTest { calculator.isPositive(5) }
        assertTrue(result)
    }

    @Test fun testIsNegative() {
        val result = MutFlow.underTest { calculator.isPositive(-5) }
        assertFalse(result)
    }
}

// If ANY test fails during a mutation run, the mutation is killed

Key principles:

Same mutation for all tests: A run activates one mutation across the entire test suite
Global discovery: Mutation points from all tests are merged into a single registry
Touch counting: During baseline, we count how many tests touch each mutation point
Run limit: Tests run up to N times (configured), or until all mutations are exhausted

This means:

We can determine if a mutation survives the entire test suite
Mutations touched by fewer tests are identified as higher risk
Precise feedback: when a mutant survives, you know exactly which one

3. Selection and Shuffle Modes (Internal)

These are internal implementation details used by the mutation selection engine. The @MutFlowTest annotation uses MostLikelyStable + PerChange by default and runs all mutations. These parameters are only exposed through the manual MutFlow.underTest(run, selection, shuffle) API.

Mutation selection is controlled by two orthogonal parameters:

enum class Selection {
    PureRandom,       // Uniform random selection
    MostLikelyRandom, // Weighted random favoring least-touched points
    MostLikelyStable  // Deterministic: always pick least-touched point
}

enum class Shuffle {
    PerRun,    // Different seed each CI build/JVM run
    PerChange  // Same seed until discovered points change
}

Selection strategies (which mutation to pick):

Selection	Behavior
`PureRandom`	Uniform random selection among untested mutations
`MostLikelyRandom`	Random but weighted toward mutations touched by fewer tests
`MostLikelyStable`	Deterministically pick the mutation touched by fewest tests

The "touch count" is calculated during baseline (run 0): each time a test executes a mutation point, that point's touch count increments. Mutations touched by fewer tests are considered higher risk and prioritized by MostLikely* strategies.

Shuffle modes (when to change the seed):

Shuffle	Behavior
`PerRun`	New random seed each JVM/CI run - exploratory
`PerChange`	Seed based on `hash(discoveredPoints)` - stable until code changes

Typical workflow:

During development: use MostLikelyRandom + PerRun to explore high-risk mutations
For merge requests: use MostLikelyStable + PerChange for reproducible results
Over time: cover all mutations across many builds

4. Global Mutation Registry

The mutation registry is a global in-memory state shared across all tests:

GlobalRegistry {
    // From baseline (run 0): which points exist and their variant counts
    discoveredPoints: Map<PointId, VariantCount>

    // From baseline (run 0): how many tests touched each point
    touchCounts: Map<PointId, Int>

    // Updated each mutation run: which mutations have been tested
    testedMutations: Set<Mutation>  // Mutation = (pointId, variantIndex)
}

Lifecycle:

Run 0 (all tests): Baseline discovery - mutation points merged globally, touch counts accumulated
Run 1+: For each run:
- Select a mutation point (using Selection strategy + touch counts)
- Pick a variant for that point (excluding already-tested variants)
- Add to testedMutations, activate, execute lambda
Exhaustion: If no untested mutations remain → throw MutationsExhaustedException

This ensures:

No mutation is tested twice within an execution
Touch counts guide selection toward under-tested mutation points
Natural termination when all mutations are covered
Run count (configured in JUnit) is the normal limit; exception is early-exit for small codebases

5. Trapping Surviving Mutants

When a mutant survives, the build fails with a display name like:

MUTANT SURVIVED: (Calculator.kt:8) > → >=

This display name can be copied into the @MutFlowTest annotation to trap the mutant - ensuring it runs first every time while you fix the test gap:

@MutFlowTest(
    traps = ["(Calculator.kt:8) > → >="]
)
class CalculatorTest {
    @Test
    fun testIsPositive() {
        val result = MutFlow.underTest { calculator.isPositive(5) }
        assertTrue(result)
    }
}

Traps are a temporary debugging aid:

Mutation survives → you get its display name
Copy display name into traps array to pin it
Fix your test until it catches the mutation
Remove the trap once fixed

Trap behavior:

Trapped mutations run first, before normal selection (regardless of selection strategy)
Multiple traps run in the order provided
After all traps are exhausted, normal selection continues
Invalid traps (e.g., code moved) print a warning with available mutations

Why display names instead of internal IDs? The display name format (FileName.kt:line) original → variant (or … variant #N when disambiguating) is:

Human-readable and self-documenting in test code
Directly copy-pasteable from survivor output
Stable enough for temporary debugging (users typically change tests, not impl)
Easy to update if code moves (the warning shows available mutations)
Unambiguous even when the same operator appears multiple times on one line

6. Scoped Mutations via Annotations and Gradle Config

The compiler plugin injects mutations into classes that are either annotated with @MutationTarget or matched by target patterns configured in the Gradle plugin:

// Option 1: Annotation on production code
@MutationTarget
class Calculator {
    // mutations injected here
}

// Option 2: Gradle config (no annotation needed on production code)
// build.gradle.kts
mutflow {
    targets = listOf(
        "com.example.Calculator",       // exact class
        "com.example.service.*",        // all classes in a package
        "com.example.service.**",       // package + all subpackages
        "com.example.*Service"          // glob pattern
    )
}

Both mechanisms can be combined freely - a class is mutated if it matches either @MutationTarget or a Gradle target pattern. If both match the same class, it is simply mutated once (no duplication).

Why two mechanisms?

@MutationTarget is convenient for small projects and makes mutation scope visible in the code
Gradle config addresses a common concern: annotating production code with test-related annotations from an external library feels invasive (see GitHub issue #2). The Gradle config keeps all mutation testing configuration in the test/build layer

Pattern matching: Target patterns support glob-style matching:

. matches literal dots in package/class names
* matches a single name segment (does not cross dots)
** matches any number of segments (crosses dots)

Patterns are compiled to regexes once at plugin initialization and matched against each class's fully qualified name during IR transformation.

This limits bytecode bloat and keeps mutations relevant.

Additionally, you can suppress mutations on specific functions using @SuppressMutations:

@MutationTarget
class Calculator {
    fun isPositive(x: Int) = x > 0  // mutations injected

    @SuppressMutations
    fun debugLog(x: Int): Boolean {
        // no mutations here - logging code doesn't need mutation testing
        return x > 100
    }
}

The @SuppressMutations annotation can be applied to:

Classes: Skip all mutations in the entire class
Functions: Skip mutations in specific functions only

Line-Level Suppression via Comments

For finer granularity, individual lines can be suppressed using comments - similar to SonarQube's // NOSONAR. Two keywords are supported with the same technical effect but different semantic intent:

mutflow:ignore - the code is not worth testing (logging, debug utilities, heuristics)
mutflow:falsePositive - the mutation is an equivalent mutant or not meaningful to test

Free-form text after the keyword serves as documentation for reviewers.

Inline comment - suppresses mutations on the same line:

val threshold = x > 100 // mutflow:ignore this is just a heuristic threshold

Standalone comment - suppresses mutations on the next line:

// mutflow:falsePositive equivalent mutant, >= and > both valid here
if (retryCount > MAX_RETRIES) { ... }

How it works:

The IR transformer reads the source file when entering a @MutationTarget class
Lines containing mutflow:ignore or mutflow:falsePositive after // are parsed
A set of suppressed line numbers is built (inline = same line, standalone = next line)
Mutation operators skip IR nodes whose source line falls in the suppressed set
Source file reads are cached per file path (no re-reading for multiple classes in the same file)

Zero production overhead: Comments are stripped by the Kotlin compiler - nothing appears in production bytecode. The suppression logic runs entirely during compilation.

Defensive behavior: If the source file cannot be read (e.g., generated sources, unusual build setups), a warning is printed and compilation continues without comment-based suppression:

[mutflow] WARNING: Could not read source file Calculator.kt - comment-based suppression (mutflow:ignore / mutflow:falsePositive) unavailable for this file

Why not a function call or annotation? A function call like ignoreMutationsForNextLine() would leave a no-op call in production bytecode (the compiler plugin only runs during test compilation in the dual-build setup). Annotations cannot target individual expressions/lines in Kotlin. Comments are zero-cost and familiar from tools like SonarQube and PMD.

7. Target Filtering for Integration Tests

In integration tests, MutFlow.underTest {} blocks often exercise multiple @MutationTarget classes, but you may only care about mutations in the class you're actually testing. Target filtering lets you scope which classes produce active mutations:

// Only test mutations from Calculator (ignore Logger, AuditService, etc.)
@MutFlowTest(includeTargets = [Calculator::class])
class CalculatorIntegrationTest { ... }

// Test mutations from everything except infrastructure classes
@MutFlowTest(excludeTargets = [AuditLogger::class, MetricsService::class])
class PaymentServiceTest { ... }

How it works:

includeTargets: Only mutations from these @MutationTarget classes are selected. Empty (default) = all classes included.
excludeTargets: Mutations from these classes are skipped. Empty (default) = no classes excluded.
Both can be combined: include narrows the set first, then exclude removes from it.

Key design decisions:

Discovery is unfiltered: All mutation points are still discovered during baseline (touch counts remain accurate for selection weighting)
Filtering applies at selection time: Only when picking which mutation to activate next
Summary reflects the filter: Total/untested counts only show filtered mutations, so "all mutations tested" means "all mutations you care about"
Exhaustion respects the filter: The session exhausts when all filtered mutations are tested, not all discovered mutations

Why class-level filtering? Point IDs encode the fully qualified class name (e.g., com.example.Calculator_0), so class-based matching is natural. The annotation uses KClass<*> references, which are type-safe and refactoring-friendly.

8. Partial Run Detection

When running a single test method from an IDE (e.g., IntelliJ's "Run Test" on one method), mutation testing is automatically skipped. This prevents false positives - mutations that would be killed by other tests in the class would incorrectly appear as survivors.

How it works:

At session creation, the extension counts @Test methods in the class
During baseline, each executed test is tracked
After baseline, if executedTests < expectedTests, mutation runs are skipped

Example output when running a single test:

[mutflow] Starting baseline run (discovery)
[mutflow] Discovered mutation point: (Calculator.kt:7) > with 2 variants
[mutflow] Partial test run detected (1/3 tests) - skipping mutation testing

The baseline still runs normally (tests execute, mutations are discovered), but no mutation runs occur. This ensures you get your test results quickly without misleading mutation feedback.

Rationale: Mutation testing evaluates the entire test suite's ability to catch mutations. Running it with a subset produces meaningless results - better to skip and provide a clear message.

9. Timeout Detection for Infinite Loops

Certain mutations can cause infinite loops - most commonly when flipping a relational operator in a loop condition (e.g., < → > in while (i < n)). Without protection, these mutations would hang the test run indefinitely.

mutflow detects this at the compiler level by injecting MutationRegistry.checkTimeout() at the top of every loop body in @MutationTarget classes:

// Before compiler plugin
while (i < n) {
    process(i)
    i++
}

// After compiler plugin (in addition to mutation point injection)
while (i < n) {
    MutationRegistry.checkTimeout()  // injected
    process(i)
    i++
}

How it works:

When a mutation run starts, MutationRegistry.withSession() computes a deadline: System.nanoTime() + timeoutMs * 1_000_000
Each loop iteration calls checkTimeout(), which compares current time against the deadline
If exceeded, throws MutationTimedOutException - the test fails with a message suggesting // mutflow:ignore
The timed-out mutation is recorded as MutationResult.TimedOut and shown in the summary

Performance characteristics of checkTimeout():

No active session (production code): immediate null check return - effectively zero cost
Baseline run (no active mutation): deadlineNanos == 0 check - fast return
Mutation run: one System.nanoTime() call per loop iteration (~20-30ns on modern JVMs)

Why compiler-injected checks instead of thread-based timeout? A Future.get(timeout) approach can detect timeouts but cannot stop tight CPU-bound infinite loops - Thread.interrupt() only works if the loop checks interruption (most don't). The compiler-injected approach cleanly breaks even tight loops like while(true) { counter++ } from within.

Loop coverage: All loop types in Kotlin compile to IrWhileLoop or IrDoWhileLoop in IR:

Kotlin source	IR node	Covered?
`while (...)`	`IrWhileLoop`	Yes
`do { ... } while (...)`	`IrDoWhileLoop`	Yes
`for (i in ...)`	`IrWhileLoop` (desugared)	Yes
`forEach { }`, `map { }`, etc.	`IrCall` (stdlib function)	No - but loop control is in stdlib, not user code

Higher-order function "loops" like forEach can't cause infinite loops from mutations because the loop control (hasNext(), counter) lives in the stdlib, not in the mutated code.

Configuration:

@MutFlowTest(timeoutMs = 60_000)  // default: 60 seconds, 0 to disable
class CalculatorTest { ... }

Design rationale - fail loudly, not silently: When a timeout occurs, the test fails rather than silently marking the mutation as killed. This ensures the developer notices and takes action (adds // mutflow:ignore on the affected line). Silent handling would mask slow mutation runs that accumulate over time.

10. Verification Mode

Controls how surviving mutations are handled. Three modes are available:

enum class VerificationMode {
    STRICT,    // survivors cause test failure (default)
    LENIENT,   // survivors are reported but don't fail
    DISABLED   // mutation runs are skipped entirely
}

Per-annotation configuration:

@MutFlowTest(verificationMode = VerificationMode.LENIENT)
class CalculatorTest { ... }

Environment variable override:

The MUTFLOW_VERIFICATION_MODE environment variable takes precedence over the annotation value. This enables phased CI pipelines without changing code:

MUTFLOW_VERIFICATION_MODE=DISABLED ./gradlew test   # fast: regular tests only
MUTFLOW_VERIFICATION_MODE=LENIENT ./gradlew test     # report mutations, don't fail
./gradlew test                                        # full strict mutation testing

Resolution order:

Check MUTFLOW_VERIFICATION_MODE environment variable
If set and valid (STRICT, LENIENT, DISABLED, case-insensitive): use it
If set but invalid: print warning, fall back to annotation value
If not set: use annotation value (default: STRICT)

How each mode affects the test lifecycle:

Phase	STRICT	LENIENT	DISABLED
Baseline (Run 0)	Runs normally	Runs normally	Runs normally
Mutation runs	All mutations tested	All mutations tested	Skipped entirely
Surviving mutation	`MutantSurvivedException` thrown — test fails	Printed as warning, test passes	N/A
Summary	Full report	Full report	No mutation data

Design rationale:

The default is STRICT because mutflow's value proposition is catching test gaps — silently ignoring survivors would undermine that. However, real-world adoption is incremental: teams adding mutflow to an existing codebase need a way to see mutation results without blocking their build. LENIENT serves this purpose. DISABLED provides a zero-overhead escape hatch for performance-sensitive workflows (equivalent to mutflow.enabled=false at the Gradle level but controllable per-run without rebuilding).

The environment variable override is intentional: it allows the same test code to behave differently in different pipeline stages. A team can run DISABLED in their fast-feedback loop, LENIENT in nightly builds, and STRICT in release pipelines — all without touching the test annotations.

Architecture

Module Responsibilities

┌─────────────────────────────────────────────────────────────────┐
│                         Test Execution                          │
├─────────────────────────────────────────────────────────────────┤
│  mutflow-junit6           │  @MutFlowTest meta-annotation       │
│                           │  @ClassTemplate + @ExtendWith       │
│                           │  MutFlowExtension: thin adapter     │
│                           │    that calls MutFlow session mgmt  │
│                           │  Depends on: mutflow-runtime        │
├─────────────────────────────────────────────────────────────────┤
│  mutflow-runtime          │  MutFlowSession: per-class state    │
│                           │  MutFlow: session management +      │
│                           │    underTest() API (parameterless   │
│                           │    and explicit versions)           │
│                           │  Selection: PureRandom, MostLikely* │
│                           │  Shuffle: PerRun, PerChange         │
│                           │  Depends on: mutflow-core           │
├─────────────────────────────────────────────────────────────────┤
│  mutflow-compiler-plugin  │  Transforms @MutationTarget classes │
│                           │  and Gradle-configured target classes│
│                           │  Injects MutationRegistry.check()   │
│                           │  Four operator interfaces:          │
│                           │    MutationOperator (IrCall nodes)  │
│                           │    ReturnMutationOperator (IrReturn)│
│                           │    FunctionBodyMutationOperator     │
│                           │    WhenMutationOperator (IrWhen)    │
│                           │  RelationalComparisonOperator:      │
│                           │    handles >, <, >=, <= operators   │
│                           │  ConstantBoundaryOperator:          │
│                           │    mutates constants by +1/-1       │
│                           │  ArithmeticOperator:                │
│                           │    handles +, -, *, /, % operators  │
│                           │  EqualitySwapOperator:              │
│                           │    handles == ↔ != swaps            │
│                           │  BooleanInversionOperator:          │
│                           │    adds ! to boolean calls/props    │
│                           │  BooleanLogicOperator:              │
│                           │    handles && ↔ || swaps            │
│                           │  BooleanReturnOperator:             │
│                           │    replaces bool returns with T/F   │
│                           │  NullableReturnOperator:            │
│                           │    replaces nullable returns w/ null│
│                           │  VoidFunctionBodyOperator:          │
│                           │    removes Unit function bodies     │
│                           │  Depends on: mutflow-core           │
├─────────────────────────────────────────────────────────────────┤
│  mutflow-core             │  @MutationTarget annotation         │
│                           │  @SuppressMutations annotation      │
│                           │  MutationRegistry (per-underTest    │
│                           │    session for discovery/activation)│
│                           │  Shared types between all modules   │
│                           │  Depends on: nothing                │
└─────────────────────────────────────────────────────────────────┘

The mutflow-core module contains the bridge between compiler-generated code and test runtime. Both sides depend on it, but not on each other, keeping coupling minimal.

Session-Based Architecture

State is scoped to sessions rather than being globally mutable:

// JUnit extension creates session at class start
val sessionId = MutFlow.createSession(selection, shuffle, maxRuns)

// Each class template invocation:
val mutation = MutFlow.selectMutationForRun(sessionId, run)  // null for baseline
MutFlow.startRun(sessionId, run, mutation)
// ... all tests execute ...
MutFlow.endRun(sessionId)

// JUnit extension closes session when class finishes
MutFlow.closeSession(sessionId)

Benefits:

Clean lifecycle: create → runs → close
State isolation: Each test class has its own session
No leaked state: Explicit cleanup
Thread-safe routing: startRun registers the calling thread to the session; parameterless underTest {} resolves the session by thread ID
Synchronized mutation execution: MutationRegistry.withSession() ensures only one underTest {} block executes at a time

Test Framework Adapters

The JUnit extension (mutflow-junit6) is intentionally a thin adapter:

Uses JUnit 6's @ClassTemplate mechanism to run the class multiple times
MutFlowExtension implements ClassTemplateInvocationContextProvider
All orchestration logic lives in mutflow-runtime (session management, mutation selection)
Extension only handles: session creation/cleanup, run start/end calls, display names

This keeps framework-specific code minimal (~100 lines) and enables easy porting to other frameworks.

Data Flow

1. Compile time:
   ┌──────────────────┐      ┌───────────────────────────────────────────────────┐
   │ x > 0            │ ───► │ when(registry.check(pointId, 2, "Calc.kt:7", ">", │
   └──────────────────┘      │                     ">=,<", occurrenceOnLine=1))  │
                             └───────────────────────────────────────────────────┘

2. Baseline (run=0) - ALL tests run first:

   Test A: underTest(run=0, selection, shuffle) { calculator.isPositive(5) }
        │
        ▼
   registry.check("sample.Calculator_0", 2) → registers point, touchCount++, returns null
   registry.check("sample.Calculator_1", 2) → registers point, touchCount++, returns null
        │
        ▼
   Returns: block result (T)

   Test B: underTest(run=0, selection, shuffle) { calculator.validate(-1) }
        │
        ▼
   registry.check("sample.Calculator_0", 2) → already known, touchCount++, returns null
   registry.check("sample.Validator_0", 2) → registers point, touchCount++, returns null
        │
        ▼
   Returns: block result (T)

   After all run=0 complete:
   GlobalRegistry {
       discoveredPoints: {sample.Calculator_0: 2, sample.Calculator_1: 2, sample.Validator_0: 2}
       touchCounts: {sample.Calculator_0: 2, sample.Calculator_1: 1, sample.Validator_0: 1}
       testedMutations: {}
   }

3. Mutation runs (run=1, 2, ...) - ALL tests run with SAME mutation:

   First underTest(run=1, selection=MostLikelyRandom, shuffle=PerChange):
        │
        ▼
   Select point: sample.Calculator_1 (lowest touch count, weighted random)
   Select variant: 0 (from range 0..1)
   Add (sample.Calculator_1, 0) to testedMutations
   Activate mutation (sample.Calculator_1, 0)
        │
        ▼
   All tests execute with (sample.Calculator_1, 0) active:
   registry.check("sample.Calculator_0", ...) → not active, returns null
   registry.check("sample.Calculator_1", ...) → active! returns 0
   registry.check("sample.Validator_0", ...) → not active, returns null
        │
        ▼
   If ANY test fails → mutation killed
   If ALL tests pass → mutation survived, report it

4. Exhaustion:
   underTest(run=N, ...) where all mutations tested
        │
        ▼
   No untested mutations remain
        │
        ▼
   Throws MutationsExhaustedException → JUnit stops iteration

Technical Decisions

Disabling Mutation Testing

The Gradle plugin exposes a mutflow extension with an enabled property (default: true). When set to false:

No mutatedMain source set is created - no extra compilation step
No compiler plugin is registered (isApplicable returns false)
Only mutflow-annotations and mutflow-junit6 are added as dependencies, so @MutationTarget and @MutFlowTest still compile
Tests run normally but discover 0 mutations (no MutationRegistry.check() calls exist in the code)

The property supports three configuration methods with the following precedence:

DSL (mutflow { enabled = false }) - highest, set explicitly in build script
Gradle property (-Pmutflow.enabled=false or gradle.properties) - used as convention (default) if DSL value is not set

This is useful for:

CI pipelines where mutation testing only runs on specific builds (e.g., nightly, not on every push)
Local development when fast iteration is needed
Temporarily disabling without removing the plugin from the build

Kotlin K2 Only

K1 is deprecated; K2 is the future
Maintaining both is too much overhead for an experimental project
By the time mutflow matures, K2 will be standard

Test Build Only

The compiler plugin is applied ONLY to test compilation, never production:

Gradle plugin applies to testCompile tasks only
Runtime guards detect non-test context and fail fast
Build verification can scan production artifacts for mutation markers

Thread Safety and Parallel Test Execution

MutationRegistry is a singleton with a single currentSession slot - only one mutation session can be active at a time. This is a fundamental constraint of the mutant schemata approach: compiler-injected MutationRegistry.check() calls have no session context, so they read from a global.

Why not ThreadLocal? Using ThreadLocal for the session would break coroutines (suspend functions can resume on different dispatcher threads) and reactive frameworks (operators run on scheduler threads). We intentionally avoid ThreadLocal to keep these doors open for future support.

Current approach: synchronized withSession

MutationRegistry.withSession() wraps the entire session lifecycle in a synchronized block:

synchronized(lock) {
    currentSession = Session(activeMutation)
    try {
        val result = block()  // production code executes, check() calls read currentSession
        return result to buildSessionResult()
    } finally {
        currentSession = null
    }
}

This means underTest {} blocks from different test classes serialize at the MutationRegistry level. Between these blocks (test setup, assertions, Spring context initialization, non-mutation tests), everything runs freely in parallel.

Session routing via thread-to-session map

Each test class has its own MutFlowSession, but the parameterless MutFlow.underTest {} API needs to find the right session without a session ID parameter. Since JUnit runs startRun, test methods, and endRun on the same thread:

MutFlow.startRun(sessionId) registers Thread.currentThread().id → sessionId
MutFlow.underTest {} looks up the session by current thread ID
MutFlow.endRun(sessionId) deregisters the thread

This is not a ThreadLocal - it's an explicit ConcurrentHashMap<Long, String> used only for test-thread routing. The coroutine concern doesn't apply here because underTest() is always called from the test thread, before entering the withSession synchronized block where production code (potentially using coroutines) executes.

Summary of parallel behavior:

Non-mutation test classes: fully parallel, unaffected
Mutation test classes: underTest {} blocks serialize; everything else (setup, assertions) is parallel
Coroutines/reactive inside underTest {}: works correctly (lock is held for the entire block)

Tradeoffs and Limitations

Scope: Coverage vs Assertion Quality

mutflow closes the gap between code coverage and assertion quality - it doesn't replace coverage tools.

Coverage tools answer: "Was this code executed?"
mutflow answers: "Do your assertions catch behavioral changes?"

Code only reached outside MutFlow.underTest { } blocks produces no mutations. This can be good (setup code, logging) or bad (forgot to wrap the action under test). Use coverage tools to ensure code is exercised; use mutflow to ensure your assertions are meaningful.

Advantages

Low overhead: Compile once, not once per mutant
Low friction: No separate tool, runs in normal tests
Reproducible: Seed-based determinism, trapped mutations for debugging
Pragmatic: "Some mutation testing > none" philosophy

Limitations

Not exhaustive per session: Each session tests a small fixed number of mutations (3-8), not all. Coverage grows over many builds.
Bytecode bloat: Injected branches increase class size (64KB method limit is a risk)
Coverage interference: Extra branches affect coverage reports (may need separate non-mutated build)
Debugging complexity: Stack traces through mutated code can be confusing
Equivalent mutants: Some mutations produce identical behavior (noise)
Shared state: Sequential mutation runs may need state invalidation (future: user-provided hooks)

Prior Art

Metamutator (SpoonLabs) - Java implementation of mutant schemata. Same core technique. Project appears inactive. Requires separate CLI tool.
Pitest - Industry standard JVM mutation testing. Traditional approach (compile per mutant). Thorough but slow.
Arcmutate - Pitest plugin for Kotlin bytecode understanding.
Mutant-Kraken - Kotlin mutation testing via AST manipulation. Traditional approach.
Untch et al. (1993) - Original academic paper on mutant schemata technique.

What mutflow adds

Kotlin-native K2 compiler plugin (not Java tooling adapted for Kotlin)
BDD-style MutFlow.underTest API for explicit test scoping
One mutation per run for precise feedback
Trap mechanism to pin surviving mutants while fixing test gaps
IR-hash based mutation point identification (stable across refactoring)

Implementation Status

What's Built

mutflow-core:

MutationRegistry with check(), checkTimeout(), startSession(), endSession(), withSession() API
withSession(): synchronized wrapper that ensures only one mutation session is active at a time
checkTimeout(): compiler-injected loop guard that throws MutationTimedOutException when deadline exceeded
Supporting types (ActiveMutation, DiscoveredPoint, SessionResult)
@MutationTarget annotation for scoping mutations
Occurrence-on-line tracking for disambiguating duplicate operators on the same source line

mutflow-compiler-plugin:

K2 compiler plugin with extensible mutation operator mechanism
MutflowCommandLineProcessor receives target patterns from Gradle plugin via SubpluginOption
Target pattern matching: glob-style patterns (*, **) compiled to regex for FQN matching
Four operator interfaces for different IR node types:
- MutationOperator - for IrCall nodes (comparison operators, etc.)
- ReturnMutationOperator - for IrReturn nodes (return statement mutations)
- FunctionBodyMutationOperator - for function declarations (body-level mutations)
- WhenMutationOperator - for IrWhen nodes (boolean logic operators)
RelationalComparisonOperator handles all comparison operators (>, <, >=, <=)
- Each operator produces 2 variants: boundary mutation + direction flip
ConstantBoundaryOperator mutates numeric constants in comparisons
- Produces 2 variants: +1 and -1 of the original constant
- Detects poorly tested boundaries that operator mutations miss
BooleanReturnOperator mutates boolean return values
- Produces 2 variants: true and false
- Only matches explicit return statements (block-bodied functions)
- Skips synthetic returns from expression-bodied functions (detected by zero-width source span)
NullableReturnOperator mutates nullable return values to null
- Produces 1 variant: null
- Only matches explicit return statements in functions with nullable return types
- Skips returns that are already null (mutating null to null is pointless)
- Catches tests that only verify non-null without checking the actual value
ArithmeticOperator mutates arithmetic operations
- + → - (1 variant)
- - → + (1 variant)
- * → / (1 variant, with safe division to avoid div-by-zero)
- / → * (1 variant)
- % → / (1 variant)
- Safe division for * → /: when b=0, computes b/a; when both are 0, returns 1
EqualitySwapOperator swaps equality operators
- == → != (1 variant: wraps EQEQ intrinsic with Boolean.not())
- != → == (1 variant: unwraps the not() wrapper to expose the inner EQEQ call)
- In K2 IR, == is a single EQEQ intrinsic; != is not(EQEQ(a, b)) - two calls both with EXCLEQ origin
- Matches EQEQ calls with EQEQ origin for ==, and not() calls with EXCLEQ origin for !=
- Avoids double-matching the inner EQEQ of != expressions (which would create spurious mutation points)
BooleanInversionOperator adds negation to boolean expressions
- expr → !expr (1 variant: wraps in Boolean.not())
- Matches boolean-returning IrCall nodes with null or GET_PROPERTY origin (function calls and property accesses)
- Excludes not() calls (would create redundant double-negation points) and EXCLEQ origin (handled by EqualitySwapOperator)
- The "remove negation" case (!expr → expr) is implicitly covered: adding ! to the inner expression of !expr produces !(!expr) = expr
Boolean variable/parameter inversion is handled directly by MutflowIrTransformer.visitGetValue
- varName → !varName (1 variant: wraps boolean IrGetValue in Boolean.not())
- Not an operator interface - IrGetValue is a leaf node, handled inline with a single mutation point
BooleanLogicOperator swaps boolean logic operators
- && → || (1 variant: swaps branch results to short-circuit true)
- || → && (1 variant: swaps branch results to short-circuit false)
- In K2 IR (2.3.0+), && and || are lowered to IrWhen expressions with ANDAND/OROR origins
- &&: when(ANDAND) { a -> b; else -> false } - if first is true, evaluate second
- ||: when(OROR) { a -> true; else -> b } - if first is true, short-circuit true
- Mutation swaps branch results: ANDAND replaces b with true and false with b (and vice versa for OROR)
VoidFunctionBodyOperator removes entire function bodies of Unit/void functions
- Produces 1 variant: empty body (all side effects removed)
- Only matches functions that return Unit, have non-empty bodies, and are not property accessors
- Catches tests that don't verify side effects - "what if this function did nothing?"
- Operates at the function declaration level, not at call sites
Recursive operator application: multiple operators can match the same expression
Type-agnostic: works with Int, Long, Double, Float, etc.
Respects @SuppressMutations annotation on classes and functions
Comment-based line suppression: // mutflow:ignore and // mutflow:falsePositive
- Reads source files for @MutationTarget classes during IR transformation
- Inline comments suppress mutations on the same line, standalone comments suppress the next line
- Cached per file, defensive fallback with warning if source file is unreadable
Timeout detection: injects MutationRegistry.checkTimeout() at the top of every loop body
- Covers IrWhileLoop and IrDoWhileLoop (all Kotlin loop constructs including for)
- Prevents mutations that cause infinite loops from hanging the test run

mutflow-runtime:

MutFlowSession: Per-class state (discovered points, touch counts, tested mutations)
MutFlow: Session management + underTest() API (parameterless and explicit versions)
Thread-to-session routing: startRun/endRun register/deregister the calling thread for parameterless underTest() resolution
Selection strategies: PureRandom, MostLikelyRandom, MostLikelyStable
Shuffle modes: PerRun, PerChange
Touch count tracking during baseline
Target filtering: includeTargets/excludeTargets for scoping mutations by class
MutationsExhaustedException when all mutations tested
VerificationMode enum: STRICT, LENIENT, DISABLED

mutflow-junit6:

@MutFlowTest meta-annotation combining @ClassTemplate + @ExtendWith
MutFlowExtension implementing ClassTemplateInvocationContextProvider
Session lifecycle management (create, startRun, endRun, close)
Mutation selection at context creation for accurate display names
Verification mode resolution: annotation parameter with MUTFLOW_VERIFICATION_MODE env var override

mutflow-test-sample:

Integration tests demonstrating both APIs

Target API

// Simple: use @MutFlowTest annotation
@MutFlowTest  // runs all mutations by default
class CalculatorTest {
    @Test
    fun testIsPositive() {
        val result = MutFlow.underTest {  // parameterless!
            calculator.isPositive(5)
        }
        assertTrue(result)
    }
}

Example output:

CalculatorTest > Run without mutations > isPositive returns true for positive numbers() PASSED
CalculatorTest > Run without mutations > isPositive returns true at boundary() PASSED
CalculatorTest > Run without mutations > isPositive returns false for negative numbers() PASSED
CalculatorTest > Run without mutations > isPositive returns false for zero() PASSED

CalculatorTest > Mutation: (Calculator.kt:7) > → >= > ... PASSED
CalculatorTest > Mutation: (Calculator.kt:7) > → < > ... PASSED
CalculatorTest > Mutation: (Calculator.kt:7) 0 → 1 > ... PASSED
CalculatorTest > Mutation: (Calculator.kt:7) 0 → -1 > ... PASSED

╔════════════════════════════════════════════════════════════════╗
║                    MUTATION TESTING SUMMARY                    ║
╠════════════════════════════════════════════════════════════════╣
║  Total mutations discovered:   4                              ║
║  Tested this run:              4                              ║
║  ├─ Killed:                    4  ✓                           ║
║  └─ Survived:                  0  ✓                           ║
║  Remaining untested:           0                              ║
╠════════════════════════════════════════════════════════════════╣
║  DETAILS:                                                      ║
║  ✓ (Calculator.kt:7) > → >=                                     ║
║      killed by: isPositive returns false for zero()            ║
║  ✓ (Calculator.kt:7) > → <                                      ║
║      killed by: isPositive returns true at boundary()          ║
║  ✓ (Calculator.kt:7) 0 → 1                                      ║
║      killed by: isPositive returns true at boundary()          ║
║  ✓ (Calculator.kt:7) 0 → -1                                     ║
║      killed by: isPositive returns false for zero()            ║
╚════════════════════════════════════════════════════════════════╝

Key behavior:

Killed mutations: When a test assertion fails during a mutation run, the exception is swallowed and the test appears as PASSED. This is intentional - a failing assertion means the test caught the mutation (good!). The mutation is recorded as "killed" internally.
Surviving mutations: After all tests in a mutation run complete, if NO test caught the mutation (all tests passed naturally), MutantSurvivedException is thrown. This fails the build and indicates a gap in test coverage.
Summary: At the end of each test class, a summary shows which mutations were tested and their results.

Why this design? The goal is that all tests appear green when mutations are properly killed. Failed assertions during mutation runs are expected and desirable - they prove your tests can detect code changes. Only when tests fail to catch a mutation does the build fail, alerting you to the coverage gap.

Transformation:

// Before (in @MutationTarget class)
fun isPositive(x: Int) = x > 0

// After compiler plugin (nested mutations for operator AND constant)
fun isPositive(x: Int) = when (MutationRegistry.check("..._0", 2, "Calculator.kt:7", ">", ">=,<", 1)) {
    0 -> x >= 0   // operator: boundary (include equality)
    1 -> x < 0    // operator: direction flip
    else -> when (MutationRegistry.check("..._1", 2, "Calculator.kt:7", "0", "1,-1", 1)) {
        0 -> x > 1    // constant: increment
        1 -> x > -1   // constant: decrement
        else -> x > 0 // original
    }
}

Planned

Gradle plugin for easy setup
Smarter likelihood calculations (see below)
State invalidation hooks

Smarter Likelihood Calculations

The current touch count metric is a simple proxy for "how well tested is this mutation point". A future improvement is enhancing the likelihood calculation by analyzing observed runtime values during baseline.

Simple cases (e.g., x > 5 where one side is a literal):

Track observed values of x during baseline
If tests only use values far from the boundary (e.g., [10, 20, 100] but never [4, 5, 6]), specific variants become higher likelihood:
- x >= 5 → HIGH likelihood (boundary at 5 never tested)
- x == 5 → HIGH likelihood (5 never observed)
- x < 5 → LOWER likelihood (tests with x=10 would fail)

Complex cases (nested expressions, non-literal boundaries):

(x + y) > threshold - harder to analyze, would need to track computed values
These cases may remain suboptimal, falling back to basic touch count

The key insight: instead of a separate "boundary analysis" feature with its own warnings/control flow, we integrate this into the existing likelihood score. Boundary-untested variants naturally bubble to the top of selection priority, get tested first, and produce standard "mutation survived" feedback if they survive.

This keeps the system focused on mutation testing rather than becoming a general boundary testing tool.

Open Questions

Best approach for counting mutations inside loops/recursion? (per-invocation vs per-source-location)
How to present surviving mutants clearly in test output? (IDE integration?)

FilesExpand file tree

DESIGN.md

Latest commit

History

DESIGN.md

File metadata and controls

mutflow - Design Document

Overview

The Problem

The Approach: Mutant Schemata

Example Transformation

Runtime Discovery Model

Key Features

1. Explicit Test Scoping with MutFlow.underTest

2. Global Baseline and Run Model

3. Selection and Shuffle Modes (Internal)

4. Global Mutation Registry

5. Trapping Surviving Mutants

6. Scoped Mutations via Annotations and Gradle Config

Line-Level Suppression via Comments

7. Target Filtering for Integration Tests

8. Partial Run Detection

9. Timeout Detection for Infinite Loops

10. Verification Mode

Architecture

Module Responsibilities

Session-Based Architecture

Test Framework Adapters

Data Flow

Technical Decisions

Disabling Mutation Testing

Kotlin K2 Only

Test Build Only

Thread Safety and Parallel Test Execution

Tradeoffs and Limitations

Scope: Coverage vs Assertion Quality

Advantages

Limitations

Prior Art

What mutflow adds

Implementation Status

What's Built

Target API

Planned

Smarter Likelihood Calculations

Open Questions

1. Explicit Test Scoping with `MutFlow.underTest`