mutflow is a Kotlin compiler plugin for lightweight, low-overhead mutation testing. It targets developers and teams who currently do no mutation testing due to the high cost and complexity of traditional tools.
Traditional mutation testing (e.g., Pitest) works by:
- Generating mutants (modified versions of code)
- Compiling each mutant separately
- Running tests against each mutant
- Reporting which mutants survived
This is thorough but expensive: many compilation cycles, long execution times, complex tooling setup. Most teams skip mutation testing entirely.
mutflow uses the "mutant schemata" (or "meta-mutant") technique:
- Compile once: The compiler plugin injects ALL mutation variants into the code at compile time, guarded by conditional switches
- Runtime selection: At test runtime, a control mechanism activates exactly one mutation per run
- Multiple runs: Tests execute multiple times - once as baseline, then with different single mutations
- Fail on survivors: If a mutant survives (tests pass when they shouldn't), the test fails with actionable feedback
Test code:
@MutFlowTest
class CalculatorTest {
@Test
fun testIsPositive() {
val result = MutFlow.underTest { // parameterless with JUnit extension
isPositive(5)
}
assertTrue(result)
}
}Production code - before:
@MutationTarget
class Calculator {
fun isPositive(x: Int): Boolean {
return x > 0
}
}Production code - after compiler plugin:
@MutationTarget
class Calculator {
fun isPositive(x: Int): Boolean {
// Compiler injects nested when expressions for multiple mutation types
return when (MutationRegistry.check(
pointId = "sample.Calculator_0",
variantCount = 2,
sourceLocation = "Calculator.kt:4",
originalOperator = ">",
variantOperators = ">=,<",
occurrenceOnLine = 1
)) {
0 -> x >= 0 // operator mutation: include equality
1 -> x < 0 // operator mutation: direction flip
else -> when (MutationRegistry.check(
pointId = "sample.Calculator_1",
variantCount = 2,
sourceLocation = "Calculator.kt:4",
originalOperator = "0",
variantOperators = "1,-1",
occurrenceOnLine = 1
)) {
0 -> x > 1 // constant mutation: increment
1 -> x > -1 // constant mutation: decrement
else -> x > 0 // original
}
}
}
}This nested structure is generated recursively by the compiler plugin. Each matching MutationOperator wraps the expression, with the else branch feeding into the next operator. Since only one mutation is active at runtime, there's no complexity - the active mutation's branch executes, all others fall through to original.
Mutation points are discovered dynamically at runtime, not statically at class load:
- Discovery run: Code executes normally (no
activeMutation). EachMutationRegistry.check()call registers "I exist with these variants" along with display metadata (source location, operator descriptions), and returnsnull(use original). After execution, the registry returns: "discovered 5 mutation points with their variant counts".
Note: Point IDs use the format ClassName_N (e.g., sample.Calculator_0), but display names show source location and operator (e.g., (Calculator.kt:7) > → >=). When the same operator appears multiple times on the same line (e.g., if (a > b && c > d)), an occurrence suffix disambiguates: the first stays > → >=, the second becomes > → >= #2. A future improvement is switching to IR-hash based IDs for stability across refactoring.
- Mutation runs: The caller specifies which mutation to activate via
ActiveMutation(pointId, variantIndex). When that point callscheck(), it returns the active variant index instead ofnull.
This dynamic discovery matters because:
- Different
underTestblocks exercise different code paths - Only mutations actually reached by the test are counted
- Same class called from different tests may hit different mutation points
Tests explicitly mark the action under test using BDD-style structure:
@MutFlowTest
class CalculatorTest {
@Test
fun testIsPositive() {
// given
val x = 5
// when
val result = MutFlow.underTest { // parameterless when using @MutFlowTest
isPositive(x)
}
// then
assertTrue(result)
}
}The MutFlow.underTest block:
- Wraps only the action under test (the "when" in given/when/then)
- Returns the result for assertions outside the block
- Assertions stay outside - they should fail when mutations change behavior
- When using
@MutFlowTest, the JUnit extension manages session lifecycle internally
Mutation testing operates at the test class level with a global registry:
- Run 0 (baseline): ALL test cases in the class execute first, discovering mutation points
- Run 1+: ALL test cases execute with the same mutation active
// With @MutFlowTest, the JUnit extension orchestrates all runs automatically:
// - Run 0 (baseline): All tests execute, mutation points discovered
// - Run 1+: All tests execute with same mutation active
@MutFlowTest
class CalculatorTest {
@Test fun testIsPositive() {
val result = MutFlow.underTest { calculator.isPositive(5) }
assertTrue(result)
}
@Test fun testIsNegative() {
val result = MutFlow.underTest { calculator.isPositive(-5) }
assertFalse(result)
}
}
// If ANY test fails during a mutation run, the mutation is killedKey principles:
- Same mutation for all tests: A run activates one mutation across the entire test suite
- Global discovery: Mutation points from all tests are merged into a single registry
- Touch counting: During baseline, we count how many tests touch each mutation point
- Run limit: Tests run up to N times (configured), or until all mutations are exhausted
This means:
- We can determine if a mutation survives the entire test suite
- Mutations touched by fewer tests are identified as higher risk
- Precise feedback: when a mutant survives, you know exactly which one
These are internal implementation details used by the mutation selection engine. The @MutFlowTest annotation uses MostLikelyStable + PerChange by default and runs all mutations. These parameters are only exposed through the manual MutFlow.underTest(run, selection, shuffle) API.
Mutation selection is controlled by two orthogonal parameters:
enum class Selection {
PureRandom, // Uniform random selection
MostLikelyRandom, // Weighted random favoring least-touched points
MostLikelyStable // Deterministic: always pick least-touched point
}
enum class Shuffle {
PerRun, // Different seed each CI build/JVM run
PerChange // Same seed until discovered points change
}Selection strategies (which mutation to pick):
| Selection | Behavior |
|---|---|
PureRandom |
Uniform random selection among untested mutations |
MostLikelyRandom |
Random but weighted toward mutations touched by fewer tests |
MostLikelyStable |
Deterministically pick the mutation touched by fewest tests |
The "touch count" is calculated during baseline (run 0): each time a test executes a mutation point, that point's touch count increments. Mutations touched by fewer tests are considered higher risk and prioritized by MostLikely* strategies.
Shuffle modes (when to change the seed):
| Shuffle | Behavior |
|---|---|
PerRun |
New random seed each JVM/CI run - exploratory |
PerChange |
Seed based on hash(discoveredPoints) - stable until code changes |
Typical workflow:
- During development: use
MostLikelyRandom+PerRunto explore high-risk mutations - For merge requests: use
MostLikelyStable+PerChangefor reproducible results - Over time: cover all mutations across many builds
The mutation registry is a global in-memory state shared across all tests:
GlobalRegistry {
// From baseline (run 0): which points exist and their variant counts
discoveredPoints: Map<PointId, VariantCount>
// From baseline (run 0): how many tests touched each point
touchCounts: Map<PointId, Int>
// Updated each mutation run: which mutations have been tested
testedMutations: Set<Mutation> // Mutation = (pointId, variantIndex)
}Lifecycle:
- Run 0 (all tests): Baseline discovery - mutation points merged globally, touch counts accumulated
- Run 1+: For each run:
- Select a mutation point (using Selection strategy + touch counts)
- Pick a variant for that point (excluding already-tested variants)
- Add to
testedMutations, activate, execute lambda
- Exhaustion: If no untested mutations remain → throw
MutationsExhaustedException
This ensures:
- No mutation is tested twice within an execution
- Touch counts guide selection toward under-tested mutation points
- Natural termination when all mutations are covered
- Run count (configured in JUnit) is the normal limit; exception is early-exit for small codebases
When a mutant survives, the build fails with a display name like:
MUTANT SURVIVED: (Calculator.kt:8) > → >=
This display name can be copied into the @MutFlowTest annotation to trap the mutant - ensuring it runs first every time while you fix the test gap:
@MutFlowTest(
traps = ["(Calculator.kt:8) > → >="]
)
class CalculatorTest {
@Test
fun testIsPositive() {
val result = MutFlow.underTest { calculator.isPositive(5) }
assertTrue(result)
}
}Traps are a temporary debugging aid:
- Mutation survives → you get its display name
- Copy display name into
trapsarray to pin it - Fix your test until it catches the mutation
- Remove the trap once fixed
Trap behavior:
- Trapped mutations run first, before normal selection (regardless of selection strategy)
- Multiple traps run in the order provided
- After all traps are exhausted, normal selection continues
- Invalid traps (e.g., code moved) print a warning with available mutations
Why display names instead of internal IDs? The display name format (FileName.kt:line) original → variant (or … variant #N when disambiguating) is:
- Human-readable and self-documenting in test code
- Directly copy-pasteable from survivor output
- Stable enough for temporary debugging (users typically change tests, not impl)
- Easy to update if code moves (the warning shows available mutations)
- Unambiguous even when the same operator appears multiple times on one line
The compiler plugin injects mutations into classes that are either annotated with @MutationTarget or matched by target patterns configured in the Gradle plugin:
// Option 1: Annotation on production code
@MutationTarget
class Calculator {
// mutations injected here
}
// Option 2: Gradle config (no annotation needed on production code)
// build.gradle.kts
mutflow {
targets = listOf(
"com.example.Calculator", // exact class
"com.example.service.*", // all classes in a package
"com.example.service.**", // package + all subpackages
"com.example.*Service" // glob pattern
)
}Both mechanisms can be combined freely - a class is mutated if it matches either @MutationTarget or a Gradle target pattern. If both match the same class, it is simply mutated once (no duplication).
Why two mechanisms?
@MutationTargetis convenient for small projects and makes mutation scope visible in the code- Gradle config addresses a common concern: annotating production code with test-related annotations from an external library feels invasive (see GitHub issue #2). The Gradle config keeps all mutation testing configuration in the test/build layer
Pattern matching: Target patterns support glob-style matching:
.matches literal dots in package/class names*matches a single name segment (does not cross dots)**matches any number of segments (crosses dots)
Patterns are compiled to regexes once at plugin initialization and matched against each class's fully qualified name during IR transformation.
This limits bytecode bloat and keeps mutations relevant.
Additionally, you can suppress mutations on specific functions using @SuppressMutations:
@MutationTarget
class Calculator {
fun isPositive(x: Int) = x > 0 // mutations injected
@SuppressMutations
fun debugLog(x: Int): Boolean {
// no mutations here - logging code doesn't need mutation testing
return x > 100
}
}The @SuppressMutations annotation can be applied to:
- Classes: Skip all mutations in the entire class
- Functions: Skip mutations in specific functions only
For finer granularity, individual lines can be suppressed using comments - similar to SonarQube's // NOSONAR. Two keywords are supported with the same technical effect but different semantic intent:
mutflow:ignore- the code is not worth testing (logging, debug utilities, heuristics)mutflow:falsePositive- the mutation is an equivalent mutant or not meaningful to test
Free-form text after the keyword serves as documentation for reviewers.
Inline comment - suppresses mutations on the same line:
val threshold = x > 100 // mutflow:ignore this is just a heuristic thresholdStandalone comment - suppresses mutations on the next line:
// mutflow:falsePositive equivalent mutant, >= and > both valid here
if (retryCount > MAX_RETRIES) { ... }How it works:
- The IR transformer reads the source file when entering a
@MutationTargetclass - Lines containing
mutflow:ignoreormutflow:falsePositiveafter//are parsed - A set of suppressed line numbers is built (inline = same line, standalone = next line)
- Mutation operators skip IR nodes whose source line falls in the suppressed set
- Source file reads are cached per file path (no re-reading for multiple classes in the same file)
Zero production overhead: Comments are stripped by the Kotlin compiler - nothing appears in production bytecode. The suppression logic runs entirely during compilation.
Defensive behavior: If the source file cannot be read (e.g., generated sources, unusual build setups), a warning is printed and compilation continues without comment-based suppression:
[mutflow] WARNING: Could not read source file Calculator.kt - comment-based suppression (mutflow:ignore / mutflow:falsePositive) unavailable for this file
Why not a function call or annotation? A function call like ignoreMutationsForNextLine() would leave a no-op call in production bytecode (the compiler plugin only runs during test compilation in the dual-build setup). Annotations cannot target individual expressions/lines in Kotlin. Comments are zero-cost and familiar from tools like SonarQube and PMD.
In integration tests, MutFlow.underTest {} blocks often exercise multiple @MutationTarget classes, but you may only care about mutations in the class you're actually testing. Target filtering lets you scope which classes produce active mutations:
// Only test mutations from Calculator (ignore Logger, AuditService, etc.)
@MutFlowTest(includeTargets = [Calculator::class])
class CalculatorIntegrationTest { ... }
// Test mutations from everything except infrastructure classes
@MutFlowTest(excludeTargets = [AuditLogger::class, MetricsService::class])
class PaymentServiceTest { ... }How it works:
includeTargets: Only mutations from these@MutationTargetclasses are selected. Empty (default) = all classes included.excludeTargets: Mutations from these classes are skipped. Empty (default) = no classes excluded.- Both can be combined: include narrows the set first, then exclude removes from it.
Key design decisions:
- Discovery is unfiltered: All mutation points are still discovered during baseline (touch counts remain accurate for selection weighting)
- Filtering applies at selection time: Only when picking which mutation to activate next
- Summary reflects the filter: Total/untested counts only show filtered mutations, so "all mutations tested" means "all mutations you care about"
- Exhaustion respects the filter: The session exhausts when all filtered mutations are tested, not all discovered mutations
Why class-level filtering? Point IDs encode the fully qualified class name (e.g., com.example.Calculator_0), so class-based matching is natural. The annotation uses KClass<*> references, which are type-safe and refactoring-friendly.
When running a single test method from an IDE (e.g., IntelliJ's "Run Test" on one method), mutation testing is automatically skipped. This prevents false positives - mutations that would be killed by other tests in the class would incorrectly appear as survivors.
How it works:
- At session creation, the extension counts
@Testmethods in the class - During baseline, each executed test is tracked
- After baseline, if
executedTests < expectedTests, mutation runs are skipped
Example output when running a single test:
[mutflow] Starting baseline run (discovery)
[mutflow] Discovered mutation point: (Calculator.kt:7) > with 2 variants
[mutflow] Partial test run detected (1/3 tests) - skipping mutation testing
The baseline still runs normally (tests execute, mutations are discovered), but no mutation runs occur. This ensures you get your test results quickly without misleading mutation feedback.
Rationale: Mutation testing evaluates the entire test suite's ability to catch mutations. Running it with a subset produces meaningless results - better to skip and provide a clear message.
Certain mutations can cause infinite loops - most commonly when flipping a relational operator in a loop condition (e.g., < → > in while (i < n)). Without protection, these mutations would hang the test run indefinitely.
mutflow detects this at the compiler level by injecting MutationRegistry.checkTimeout() at the top of every loop body in @MutationTarget classes:
// Before compiler plugin
while (i < n) {
process(i)
i++
}
// After compiler plugin (in addition to mutation point injection)
while (i < n) {
MutationRegistry.checkTimeout() // injected
process(i)
i++
}How it works:
- When a mutation run starts,
MutationRegistry.withSession()computes a deadline:System.nanoTime() + timeoutMs * 1_000_000 - Each loop iteration calls
checkTimeout(), which compares current time against the deadline - If exceeded, throws
MutationTimedOutException- the test fails with a message suggesting// mutflow:ignore - The timed-out mutation is recorded as
MutationResult.TimedOutand shown in the summary
Performance characteristics of checkTimeout():
- No active session (production code): immediate
nullcheck return - effectively zero cost - Baseline run (no active mutation):
deadlineNanos == 0check - fast return - Mutation run: one
System.nanoTime()call per loop iteration (~20-30ns on modern JVMs)
Why compiler-injected checks instead of thread-based timeout?
A Future.get(timeout) approach can detect timeouts but cannot stop tight CPU-bound infinite loops - Thread.interrupt() only works if the loop checks interruption (most don't). The compiler-injected approach cleanly breaks even tight loops like while(true) { counter++ } from within.
Loop coverage:
All loop types in Kotlin compile to IrWhileLoop or IrDoWhileLoop in IR:
| Kotlin source | IR node | Covered? |
|---|---|---|
while (...) |
IrWhileLoop |
Yes |
do { ... } while (...) |
IrDoWhileLoop |
Yes |
for (i in ...) |
IrWhileLoop (desugared) |
Yes |
forEach { }, map { }, etc. |
IrCall (stdlib function) |
No - but loop control is in stdlib, not user code |
Higher-order function "loops" like forEach can't cause infinite loops from mutations because the loop control (hasNext(), counter) lives in the stdlib, not in the mutated code.
Configuration:
@MutFlowTest(timeoutMs = 60_000) // default: 60 seconds, 0 to disable
class CalculatorTest { ... }Design rationale - fail loudly, not silently:
When a timeout occurs, the test fails rather than silently marking the mutation as killed. This ensures the developer notices and takes action (adds // mutflow:ignore on the affected line). Silent handling would mask slow mutation runs that accumulate over time.
Controls how surviving mutations are handled. Three modes are available:
enum class VerificationMode {
STRICT, // survivors cause test failure (default)
LENIENT, // survivors are reported but don't fail
DISABLED // mutation runs are skipped entirely
}Per-annotation configuration:
@MutFlowTest(verificationMode = VerificationMode.LENIENT)
class CalculatorTest { ... }Environment variable override:
The MUTFLOW_VERIFICATION_MODE environment variable takes precedence over the annotation value. This enables phased CI pipelines without changing code:
MUTFLOW_VERIFICATION_MODE=DISABLED ./gradlew test # fast: regular tests only
MUTFLOW_VERIFICATION_MODE=LENIENT ./gradlew test # report mutations, don't fail
./gradlew test # full strict mutation testingResolution order:
- Check
MUTFLOW_VERIFICATION_MODEenvironment variable - If set and valid (
STRICT,LENIENT,DISABLED, case-insensitive): use it - If set but invalid: print warning, fall back to annotation value
- If not set: use annotation value (default:
STRICT)
How each mode affects the test lifecycle:
| Phase | STRICT | LENIENT | DISABLED |
|---|---|---|---|
| Baseline (Run 0) | Runs normally | Runs normally | Runs normally |
| Mutation runs | All mutations tested | All mutations tested | Skipped entirely |
| Surviving mutation | MutantSurvivedException thrown — test fails |
Printed as warning, test passes | N/A |
| Summary | Full report | Full report | No mutation data |
Design rationale:
The default is STRICT because mutflow's value proposition is catching test gaps — silently ignoring survivors would undermine that. However, real-world adoption is incremental: teams adding mutflow to an existing codebase need a way to see mutation results without blocking their build. LENIENT serves this purpose. DISABLED provides a zero-overhead escape hatch for performance-sensitive workflows (equivalent to mutflow.enabled=false at the Gradle level but controllable per-run without rebuilding).
The environment variable override is intentional: it allows the same test code to behave differently in different pipeline stages. A team can run DISABLED in their fast-feedback loop, LENIENT in nightly builds, and STRICT in release pipelines — all without touching the test annotations.
┌─────────────────────────────────────────────────────────────────┐
│ Test Execution │
├─────────────────────────────────────────────────────────────────┤
│ mutflow-junit6 │ @MutFlowTest meta-annotation │
│ │ @ClassTemplate + @ExtendWith │
│ │ MutFlowExtension: thin adapter │
│ │ that calls MutFlow session mgmt │
│ │ Depends on: mutflow-runtime │
├─────────────────────────────────────────────────────────────────┤
│ mutflow-runtime │ MutFlowSession: per-class state │
│ │ MutFlow: session management + │
│ │ underTest() API (parameterless │
│ │ and explicit versions) │
│ │ Selection: PureRandom, MostLikely* │
│ │ Shuffle: PerRun, PerChange │
│ │ Depends on: mutflow-core │
├─────────────────────────────────────────────────────────────────┤
│ mutflow-compiler-plugin │ Transforms @MutationTarget classes │
│ │ and Gradle-configured target classes│
│ │ Injects MutationRegistry.check() │
│ │ Four operator interfaces: │
│ │ MutationOperator (IrCall nodes) │
│ │ ReturnMutationOperator (IrReturn)│
│ │ FunctionBodyMutationOperator │
│ │ WhenMutationOperator (IrWhen) │
│ │ RelationalComparisonOperator: │
│ │ handles >, <, >=, <= operators │
│ │ ConstantBoundaryOperator: │
│ │ mutates constants by +1/-1 │
│ │ ArithmeticOperator: │
│ │ handles +, -, *, /, % operators │
│ │ EqualitySwapOperator: │
│ │ handles == ↔ != swaps │
│ │ BooleanInversionOperator: │
│ │ adds ! to boolean calls/props │
│ │ BooleanLogicOperator: │
│ │ handles && ↔ || swaps │
│ │ BooleanReturnOperator: │
│ │ replaces bool returns with T/F │
│ │ NullableReturnOperator: │
│ │ replaces nullable returns w/ null│
│ │ VoidFunctionBodyOperator: │
│ │ removes Unit function bodies │
│ │ Depends on: mutflow-core │
├─────────────────────────────────────────────────────────────────┤
│ mutflow-core │ @MutationTarget annotation │
│ │ @SuppressMutations annotation │
│ │ MutationRegistry (per-underTest │
│ │ session for discovery/activation)│
│ │ Shared types between all modules │
│ │ Depends on: nothing │
└─────────────────────────────────────────────────────────────────┘
The mutflow-core module contains the bridge between compiler-generated code and test
runtime. Both sides depend on it, but not on each other, keeping coupling minimal.
State is scoped to sessions rather than being globally mutable:
// JUnit extension creates session at class start
val sessionId = MutFlow.createSession(selection, shuffle, maxRuns)
// Each class template invocation:
val mutation = MutFlow.selectMutationForRun(sessionId, run) // null for baseline
MutFlow.startRun(sessionId, run, mutation)
// ... all tests execute ...
MutFlow.endRun(sessionId)
// JUnit extension closes session when class finishes
MutFlow.closeSession(sessionId)Benefits:
- Clean lifecycle: create → runs → close
- State isolation: Each test class has its own session
- No leaked state: Explicit cleanup
- Thread-safe routing:
startRunregisters the calling thread to the session; parameterlessunderTest {}resolves the session by thread ID - Synchronized mutation execution:
MutationRegistry.withSession()ensures only oneunderTest {}block executes at a time
The JUnit extension (mutflow-junit6) is intentionally a thin adapter:
- Uses JUnit 6's
@ClassTemplatemechanism to run the class multiple times MutFlowExtensionimplementsClassTemplateInvocationContextProvider- All orchestration logic lives in
mutflow-runtime(session management, mutation selection) - Extension only handles: session creation/cleanup, run start/end calls, display names
This keeps framework-specific code minimal (~100 lines) and enables easy porting to other frameworks.
1. Compile time:
┌──────────────────┐ ┌───────────────────────────────────────────────────┐
│ x > 0 │ ───► │ when(registry.check(pointId, 2, "Calc.kt:7", ">", │
└──────────────────┘ │ ">=,<", occurrenceOnLine=1)) │
└───────────────────────────────────────────────────┘
2. Baseline (run=0) - ALL tests run first:
Test A: underTest(run=0, selection, shuffle) { calculator.isPositive(5) }
│
▼
registry.check("sample.Calculator_0", 2) → registers point, touchCount++, returns null
registry.check("sample.Calculator_1", 2) → registers point, touchCount++, returns null
│
▼
Returns: block result (T)
Test B: underTest(run=0, selection, shuffle) { calculator.validate(-1) }
│
▼
registry.check("sample.Calculator_0", 2) → already known, touchCount++, returns null
registry.check("sample.Validator_0", 2) → registers point, touchCount++, returns null
│
▼
Returns: block result (T)
After all run=0 complete:
GlobalRegistry {
discoveredPoints: {sample.Calculator_0: 2, sample.Calculator_1: 2, sample.Validator_0: 2}
touchCounts: {sample.Calculator_0: 2, sample.Calculator_1: 1, sample.Validator_0: 1}
testedMutations: {}
}
3. Mutation runs (run=1, 2, ...) - ALL tests run with SAME mutation:
First underTest(run=1, selection=MostLikelyRandom, shuffle=PerChange):
│
▼
Select point: sample.Calculator_1 (lowest touch count, weighted random)
Select variant: 0 (from range 0..1)
Add (sample.Calculator_1, 0) to testedMutations
Activate mutation (sample.Calculator_1, 0)
│
▼
All tests execute with (sample.Calculator_1, 0) active:
registry.check("sample.Calculator_0", ...) → not active, returns null
registry.check("sample.Calculator_1", ...) → active! returns 0
registry.check("sample.Validator_0", ...) → not active, returns null
│
▼
If ANY test fails → mutation killed
If ALL tests pass → mutation survived, report it
4. Exhaustion:
underTest(run=N, ...) where all mutations tested
│
▼
No untested mutations remain
│
▼
Throws MutationsExhaustedException → JUnit stops iteration
The Gradle plugin exposes a mutflow extension with an enabled property (default: true). When set to false:
- No
mutatedMainsource set is created - no extra compilation step - No compiler plugin is registered (
isApplicablereturnsfalse) - Only
mutflow-annotationsandmutflow-junit6are added as dependencies, so@MutationTargetand@MutFlowTeststill compile - Tests run normally but discover 0 mutations (no
MutationRegistry.check()calls exist in the code)
The property supports three configuration methods with the following precedence:
- DSL (
mutflow { enabled = false }) - highest, set explicitly in build script - Gradle property (
-Pmutflow.enabled=falseorgradle.properties) - used as convention (default) if DSL value is not set
This is useful for:
- CI pipelines where mutation testing only runs on specific builds (e.g., nightly, not on every push)
- Local development when fast iteration is needed
- Temporarily disabling without removing the plugin from the build
- K1 is deprecated; K2 is the future
- Maintaining both is too much overhead for an experimental project
- By the time mutflow matures, K2 will be standard
The compiler plugin is applied ONLY to test compilation, never production:
- Gradle plugin applies to
testCompiletasks only - Runtime guards detect non-test context and fail fast
- Build verification can scan production artifacts for mutation markers
MutationRegistry is a singleton with a single currentSession slot - only one mutation session can be active at a time. This is a fundamental constraint of the mutant schemata approach: compiler-injected MutationRegistry.check() calls have no session context, so they read from a global.
Why not ThreadLocal? Using ThreadLocal for the session would break coroutines (suspend functions can resume on different dispatcher threads) and reactive frameworks (operators run on scheduler threads). We intentionally avoid ThreadLocal to keep these doors open for future support.
Current approach: synchronized withSession
MutationRegistry.withSession() wraps the entire session lifecycle in a synchronized block:
synchronized(lock) {
currentSession = Session(activeMutation)
try {
val result = block() // production code executes, check() calls read currentSession
return result to buildSessionResult()
} finally {
currentSession = null
}
}This means underTest {} blocks from different test classes serialize at the MutationRegistry level. Between these blocks (test setup, assertions, Spring context initialization, non-mutation tests), everything runs freely in parallel.
Session routing via thread-to-session map
Each test class has its own MutFlowSession, but the parameterless MutFlow.underTest {} API needs to find the right session without a session ID parameter. Since JUnit runs startRun, test methods, and endRun on the same thread:
MutFlow.startRun(sessionId)registersThread.currentThread().id → sessionIdMutFlow.underTest {}looks up the session by current thread IDMutFlow.endRun(sessionId)deregisters the thread
This is not a ThreadLocal - it's an explicit ConcurrentHashMap<Long, String> used only for test-thread routing. The coroutine concern doesn't apply here because underTest() is always called from the test thread, before entering the withSession synchronized block where production code (potentially using coroutines) executes.
Summary of parallel behavior:
- Non-mutation test classes: fully parallel, unaffected
- Mutation test classes:
underTest {}blocks serialize; everything else (setup, assertions) is parallel - Coroutines/reactive inside
underTest {}: works correctly (lock is held for the entire block)
mutflow closes the gap between code coverage and assertion quality - it doesn't replace coverage tools.
- Coverage tools answer: "Was this code executed?"
- mutflow answers: "Do your assertions catch behavioral changes?"
Code only reached outside MutFlow.underTest { } blocks produces no mutations. This can be good (setup code, logging) or bad (forgot to wrap the action under test). Use coverage tools to ensure code is exercised; use mutflow to ensure your assertions are meaningful.
- Low overhead: Compile once, not once per mutant
- Low friction: No separate tool, runs in normal tests
- Reproducible: Seed-based determinism, trapped mutations for debugging
- Pragmatic: "Some mutation testing > none" philosophy
- Not exhaustive per session: Each session tests a small fixed number of mutations (3-8), not all. Coverage grows over many builds.
- Bytecode bloat: Injected branches increase class size (64KB method limit is a risk)
- Coverage interference: Extra branches affect coverage reports (may need separate non-mutated build)
- Debugging complexity: Stack traces through mutated code can be confusing
- Equivalent mutants: Some mutations produce identical behavior (noise)
- Shared state: Sequential mutation runs may need state invalidation (future: user-provided hooks)
- Metamutator (SpoonLabs) - Java implementation of mutant schemata. Same core technique. Project appears inactive. Requires separate CLI tool.
- Pitest - Industry standard JVM mutation testing. Traditional approach (compile per mutant). Thorough but slow.
- Arcmutate - Pitest plugin for Kotlin bytecode understanding.
- Mutant-Kraken - Kotlin mutation testing via AST manipulation. Traditional approach.
- Untch et al. (1993) - Original academic paper on mutant schemata technique.
- Kotlin-native K2 compiler plugin (not Java tooling adapted for Kotlin)
- BDD-style
MutFlow.underTestAPI for explicit test scoping - One mutation per run for precise feedback
- Trap mechanism to pin surviving mutants while fixing test gaps
- IR-hash based mutation point identification (stable across refactoring)
mutflow-core:
MutationRegistrywithcheck(),checkTimeout(),startSession(),endSession(),withSession()APIwithSession(): synchronized wrapper that ensures only one mutation session is active at a timecheckTimeout(): compiler-injected loop guard that throwsMutationTimedOutExceptionwhen deadline exceeded- Supporting types (
ActiveMutation,DiscoveredPoint,SessionResult) @MutationTargetannotation for scoping mutations- Occurrence-on-line tracking for disambiguating duplicate operators on the same source line
mutflow-compiler-plugin:
- K2 compiler plugin with extensible mutation operator mechanism
MutflowCommandLineProcessorreceives target patterns from Gradle plugin viaSubpluginOption- Target pattern matching: glob-style patterns (
*,**) compiled to regex for FQN matching - Four operator interfaces for different IR node types:
MutationOperator- forIrCallnodes (comparison operators, etc.)ReturnMutationOperator- forIrReturnnodes (return statement mutations)FunctionBodyMutationOperator- for function declarations (body-level mutations)WhenMutationOperator- forIrWhennodes (boolean logic operators)
RelationalComparisonOperatorhandles all comparison operators (>,<,>=,<=)- Each operator produces 2 variants: boundary mutation + direction flip
ConstantBoundaryOperatormutates numeric constants in comparisons- Produces 2 variants: +1 and -1 of the original constant
- Detects poorly tested boundaries that operator mutations miss
BooleanReturnOperatormutates boolean return values- Produces 2 variants:
trueandfalse - Only matches explicit return statements (block-bodied functions)
- Skips synthetic returns from expression-bodied functions (detected by zero-width source span)
- Produces 2 variants:
NullableReturnOperatormutates nullable return values to null- Produces 1 variant:
null - Only matches explicit return statements in functions with nullable return types
- Skips returns that are already null (mutating null to null is pointless)
- Catches tests that only verify non-null without checking the actual value
- Produces 1 variant:
ArithmeticOperatormutates arithmetic operations+→-(1 variant)-→+(1 variant)*→/(1 variant, with safe division to avoid div-by-zero)/→*(1 variant)%→/(1 variant)- Safe division for
*→/: when b=0, computes b/a; when both are 0, returns 1
EqualitySwapOperatorswaps equality operators==→!=(1 variant: wraps EQEQ intrinsic withBoolean.not())!=→==(1 variant: unwraps thenot()wrapper to expose the inner EQEQ call)- In K2 IR,
==is a single EQEQ intrinsic;!=isnot(EQEQ(a, b))- two calls both with EXCLEQ origin - Matches EQEQ calls with EQEQ origin for
==, andnot()calls with EXCLEQ origin for!= - Avoids double-matching the inner EQEQ of
!=expressions (which would create spurious mutation points)
BooleanInversionOperatoradds negation to boolean expressionsexpr→!expr(1 variant: wraps inBoolean.not())- Matches boolean-returning
IrCallnodes with null orGET_PROPERTYorigin (function calls and property accesses) - Excludes
not()calls (would create redundant double-negation points) and EXCLEQ origin (handled by EqualitySwapOperator) - The "remove negation" case (
!expr→expr) is implicitly covered: adding!to the inner expression of!exprproduces!(!expr)=expr
- Boolean variable/parameter inversion is handled directly by
MutflowIrTransformer.visitGetValuevarName→!varName(1 variant: wraps booleanIrGetValueinBoolean.not())- Not an operator interface -
IrGetValueis a leaf node, handled inline with a single mutation point
BooleanLogicOperatorswaps boolean logic operators&&→||(1 variant: swaps branch results to short-circuit true)||→&&(1 variant: swaps branch results to short-circuit false)- In K2 IR (2.3.0+),
&&and||are lowered toIrWhenexpressions with ANDAND/OROR origins &&:when(ANDAND) { a -> b; else -> false }- if first is true, evaluate second||:when(OROR) { a -> true; else -> b }- if first is true, short-circuit true- Mutation swaps branch results: ANDAND replaces
bwithtrueandfalsewithb(and vice versa for OROR)
VoidFunctionBodyOperatorremoves entire function bodies of Unit/void functions- Produces 1 variant: empty body (all side effects removed)
- Only matches functions that return Unit, have non-empty bodies, and are not property accessors
- Catches tests that don't verify side effects - "what if this function did nothing?"
- Operates at the function declaration level, not at call sites
- Recursive operator application: multiple operators can match the same expression
- Type-agnostic: works with
Int,Long,Double,Float, etc. - Respects
@SuppressMutationsannotation on classes and functions - Comment-based line suppression:
// mutflow:ignoreand// mutflow:falsePositive- Reads source files for
@MutationTargetclasses during IR transformation - Inline comments suppress mutations on the same line, standalone comments suppress the next line
- Cached per file, defensive fallback with warning if source file is unreadable
- Reads source files for
- Timeout detection: injects
MutationRegistry.checkTimeout()at the top of every loop body- Covers
IrWhileLoopandIrDoWhileLoop(all Kotlin loop constructs includingfor) - Prevents mutations that cause infinite loops from hanging the test run
- Covers
mutflow-runtime:
MutFlowSession: Per-class state (discovered points, touch counts, tested mutations)MutFlow: Session management +underTest()API (parameterless and explicit versions)- Thread-to-session routing:
startRun/endRunregister/deregister the calling thread for parameterlessunderTest()resolution - Selection strategies:
PureRandom,MostLikelyRandom,MostLikelyStable - Shuffle modes:
PerRun,PerChange - Touch count tracking during baseline
- Target filtering:
includeTargets/excludeTargetsfor scoping mutations by class MutationsExhaustedExceptionwhen all mutations testedVerificationModeenum:STRICT,LENIENT,DISABLED
mutflow-junit6:
@MutFlowTestmeta-annotation combining@ClassTemplate+@ExtendWithMutFlowExtensionimplementingClassTemplateInvocationContextProvider- Session lifecycle management (create, startRun, endRun, close)
- Mutation selection at context creation for accurate display names
- Verification mode resolution: annotation parameter with
MUTFLOW_VERIFICATION_MODEenv var override
mutflow-test-sample:
- Integration tests demonstrating both APIs
// Simple: use @MutFlowTest annotation
@MutFlowTest // runs all mutations by default
class CalculatorTest {
@Test
fun testIsPositive() {
val result = MutFlow.underTest { // parameterless!
calculator.isPositive(5)
}
assertTrue(result)
}
}Example output:
CalculatorTest > Run without mutations > isPositive returns true for positive numbers() PASSED
CalculatorTest > Run without mutations > isPositive returns true at boundary() PASSED
CalculatorTest > Run without mutations > isPositive returns false for negative numbers() PASSED
CalculatorTest > Run without mutations > isPositive returns false for zero() PASSED
CalculatorTest > Mutation: (Calculator.kt:7) > → >= > ... PASSED
CalculatorTest > Mutation: (Calculator.kt:7) > → < > ... PASSED
CalculatorTest > Mutation: (Calculator.kt:7) 0 → 1 > ... PASSED
CalculatorTest > Mutation: (Calculator.kt:7) 0 → -1 > ... PASSED
╔════════════════════════════════════════════════════════════════╗
║ MUTATION TESTING SUMMARY ║
╠════════════════════════════════════════════════════════════════╣
║ Total mutations discovered: 4 ║
║ Tested this run: 4 ║
║ ├─ Killed: 4 ✓ ║
║ └─ Survived: 0 ✓ ║
║ Remaining untested: 0 ║
╠════════════════════════════════════════════════════════════════╣
║ DETAILS: ║
║ ✓ (Calculator.kt:7) > → >= ║
║ killed by: isPositive returns false for zero() ║
║ ✓ (Calculator.kt:7) > → < ║
║ killed by: isPositive returns true at boundary() ║
║ ✓ (Calculator.kt:7) 0 → 1 ║
║ killed by: isPositive returns true at boundary() ║
║ ✓ (Calculator.kt:7) 0 → -1 ║
║ killed by: isPositive returns false for zero() ║
╚════════════════════════════════════════════════════════════════╝
Key behavior:
- Killed mutations: When a test assertion fails during a mutation run, the exception is swallowed and the test appears as PASSED. This is intentional - a failing assertion means the test caught the mutation (good!). The mutation is recorded as "killed" internally.
- Surviving mutations: After all tests in a mutation run complete, if NO test caught the mutation (all tests passed naturally),
MutantSurvivedExceptionis thrown. This fails the build and indicates a gap in test coverage. - Summary: At the end of each test class, a summary shows which mutations were tested and their results.
Why this design? The goal is that all tests appear green when mutations are properly killed. Failed assertions during mutation runs are expected and desirable - they prove your tests can detect code changes. Only when tests fail to catch a mutation does the build fail, alerting you to the coverage gap.
Transformation:
// Before (in @MutationTarget class)
fun isPositive(x: Int) = x > 0
// After compiler plugin (nested mutations for operator AND constant)
fun isPositive(x: Int) = when (MutationRegistry.check("..._0", 2, "Calculator.kt:7", ">", ">=,<", 1)) {
0 -> x >= 0 // operator: boundary (include equality)
1 -> x < 0 // operator: direction flip
else -> when (MutationRegistry.check("..._1", 2, "Calculator.kt:7", "0", "1,-1", 1)) {
0 -> x > 1 // constant: increment
1 -> x > -1 // constant: decrement
else -> x > 0 // original
}
}- Gradle plugin for easy setup
- Smarter likelihood calculations (see below)
- State invalidation hooks
The current touch count metric is a simple proxy for "how well tested is this mutation point". A future improvement is enhancing the likelihood calculation by analyzing observed runtime values during baseline.
Simple cases (e.g., x > 5 where one side is a literal):
- Track observed values of
xduring baseline - If tests only use values far from the boundary (e.g.,
[10, 20, 100]but never[4, 5, 6]), specific variants become higher likelihood:x >= 5→ HIGH likelihood (boundary at 5 never tested)x == 5→ HIGH likelihood (5 never observed)x < 5→ LOWER likelihood (tests with x=10 would fail)
Complex cases (nested expressions, non-literal boundaries):
(x + y) > threshold- harder to analyze, would need to track computed values- These cases may remain suboptimal, falling back to basic touch count
The key insight: instead of a separate "boundary analysis" feature with its own warnings/control flow, we integrate this into the existing likelihood score. Boundary-untested variants naturally bubble to the top of selection priority, get tested first, and produce standard "mutation survived" feedback if they survive.
This keeps the system focused on mutation testing rather than becoming a general boundary testing tool.
- Best approach for counting mutations inside loops/recursion? (per-invocation vs per-source-location)
- How to present surviving mutants clearly in test output? (IDE integration?)