flake-bisect

Find the pytest test(s) that poison a flaky target.

You have a test that passes when you run it alone but fails as part of the full suite. Some other test mutates global state — os.environ, a singleton, a module-level cache, a database row, the current working directory, a registered signal handler — and your target is the one that notices. flake-bisect narrows the polluter down to a minimal set using delta-debugging over the test ordering, so you stop guessing and start reading the right diff.

$ python -m flake_bisect --workdir examples/polluting_demo \
                        --target test_target.py::test_assumes_clean_env
flake-bisect 0.1.0
workdir : .../examples/polluting_demo
target  : test_target.py::test_assumes_clean_env
Collecting tests...
Collected 8 tests (7 candidates).
Sanity check: target alone...
  OK (passes alone)
Sanity check: target after full suite...
  OK (target outcome: FAILED)
Bisecting 7 candidate predecessors...

Minimal poisoning set (1 test):
  test_pollute.py::test_sets_env_flag

Reproduce locally:
  pytest test_pollute.py::test_sets_env_flag test_target.py::test_assumes_clean_env

pytest invocations during bisect: 3 (cap: 200)

A naive linear search across N candidate predecessors would take up to N runs of the suite. flake-bisect typically converges in O(log N) pytest invocations when there is a single polluter, and stays sub-linear with a small number of polluters.

Why this exists

The Python testing & debugging community has converged on a clear playbook for non-flaky suites: ban global state in tests, use fixtures with monkeypatch, isolate the DB per test, run with pytest-randomly in CI to surface ordering bugs early. The hard part is what to do when CI catches one. The failing line tells you what broke; it never tells you who set the landmine 200 tests earlier.

flake-bisect does that last mile: given a known-flaky target, it points at the test that poisons it.

Running it

flake-bisect is a self-contained Python package with no third-party dependencies. It needs pytest available in the same Python environment as the project you're bisecting (it shells out to python -m pytest).

Clone the repo and run from source:

git clone https://github.com/python-testing-debugging/flake-bisect.git
cd flake-bisect
python -m flake_bisect --help

To use it against your own project, point --workdir at your project root and add flake-bisect to PYTHONPATH so the module is importable:

PYTHONPATH=/path/to/flake-bisect python -m flake_bisect \
    --workdir /path/to/your/project \
    --target tests/test_widgets.py::test_render_safely

Or run it from inside the flake-bisect checkout with an absolute --workdir.

Common flags

Flag	Purpose
`--target`	Required. The flaky test's nodeid as pytest reports it.
`--testpaths`	Limit candidate predecessors to specific paths (otherwise full suite).
`--workdir`	Run pytest from this directory (default: cwd).
`--max-runs`	Hard cap on pytest invocations during bisect (default: 200).
`--pytest-arg`	Forward an arg to every pytest invocation. Repeat to pass multiple.
`-v`	Show per-iteration progress.

Forwarding pytest options

If your project needs particular pytest options to even collect (a -p plugin, -o override, marker filter, etc.), forward them with repeated --pytest-arg:

python -m flake_bisect \
    --target tests/test_x.py::test_y \
    --pytest-arg -o --pytest-arg "addopts=" \
    --pytest-arg -m --pytest-arg "not slow"

How it works

Collect all nodeids in the suite via pytest --collect-only.
Sanity check 1: run the target alone; bail out if it fails (then the issue isn't ordering, it's the test itself).
Sanity check 2: run […all other tests…, target] in order; bail out if the target passes (no reproducible pollution to bisect).
Delta-debug the predecessor list with Zeller's ddmin. Each candidate subset is run as pytest <subset…> <target> in a fresh subprocess, with collection order pinned by a bundled internal plugin so the result doesn't depend on pytest-randomly or alphabetical ordering surprises.
Report the minimal subset that still reproduces the failure plus a copy-pasteable pytest command to reproduce locally.

Determinism note: flake-bisect cannot help with flakes caused by time, threads, networking, or RNG without a fixed seed. Those are not ordering bugs. If sanity check 2 doesn't reproduce the failure deterministically, the bug is somewhere else and the CLI will say so.

Exit codes

Code	Meaning
0	Bisect completed; poisoning set printed.
2	Collection problem (no tests, target nodeid not found, ...).
3	Target fails when run alone — not an ordering issue.
4	Target passes in the full ordered run — no pollution reproduced.
5	`--max-runs` budget exhausted.

These are stable; CI can branch on them.

Demo

The examples/polluting_demo/ directory contains a six-test suite with one polluter and one target. Use it to verify the tool runs in your environment:

python -m flake_bisect \
    --workdir examples/polluting_demo \
    --target test_target.py::test_assumes_clean_env

You should see test_pollute.py::test_sets_env_flag named as the culprit.

Background reading

Deeper material on flaky tests, pytest internals, isolation patterns, and the delta-debugging algorithm lives at python-testing-debugging.com.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
examples/polluting_demo		examples/polluting_demo
flake_bisect		flake_bisect
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

flake-bisect

Why this exists

Running it

Common flags

Forwarding pytest options

How it works

Exit codes

Demo

Background reading

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

flake-bisect

Why this exists

Running it

Common flags

Forwarding pytest options

How it works

Exit codes

Demo

Background reading

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages