Skip to content

python-testing-debugging/flake-bisect

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

flake-bisect

Find the pytest test(s) that poison a flaky target.

You have a test that passes when you run it alone but fails as part of the full suite. Some other test mutates global state — os.environ, a singleton, a module-level cache, a database row, the current working directory, a registered signal handler — and your target is the one that notices. flake-bisect narrows the polluter down to a minimal set using delta-debugging over the test ordering, so you stop guessing and start reading the right diff.

$ python -m flake_bisect --workdir examples/polluting_demo \
                        --target test_target.py::test_assumes_clean_env
flake-bisect 0.1.0
workdir : .../examples/polluting_demo
target  : test_target.py::test_assumes_clean_env
Collecting tests...
Collected 8 tests (7 candidates).
Sanity check: target alone...
  OK (passes alone)
Sanity check: target after full suite...
  OK (target outcome: FAILED)
Bisecting 7 candidate predecessors...

Minimal poisoning set (1 test):
  test_pollute.py::test_sets_env_flag

Reproduce locally:
  pytest test_pollute.py::test_sets_env_flag test_target.py::test_assumes_clean_env

pytest invocations during bisect: 3 (cap: 200)

A naive linear search across N candidate predecessors would take up to N runs of the suite. flake-bisect typically converges in O(log N) pytest invocations when there is a single polluter, and stays sub-linear with a small number of polluters.

Why this exists

The Python testing & debugging community has converged on a clear playbook for non-flaky suites: ban global state in tests, use fixtures with monkeypatch, isolate the DB per test, run with pytest-randomly in CI to surface ordering bugs early. The hard part is what to do when CI catches one. The failing line tells you what broke; it never tells you who set the landmine 200 tests earlier.

flake-bisect does that last mile: given a known-flaky target, it points at the test that poisons it.

Running it

flake-bisect is a self-contained Python package with no third-party dependencies. It needs pytest available in the same Python environment as the project you're bisecting (it shells out to python -m pytest).

Clone the repo and run from source:

git clone https://github.com/python-testing-debugging/flake-bisect.git
cd flake-bisect
python -m flake_bisect --help

To use it against your own project, point --workdir at your project root and add flake-bisect to PYTHONPATH so the module is importable:

PYTHONPATH=/path/to/flake-bisect python -m flake_bisect \
    --workdir /path/to/your/project \
    --target tests/test_widgets.py::test_render_safely

Or run it from inside the flake-bisect checkout with an absolute --workdir.

Common flags

Flag Purpose
--target Required. The flaky test's nodeid as pytest reports it.
--testpaths Limit candidate predecessors to specific paths (otherwise full suite).
--workdir Run pytest from this directory (default: cwd).
--max-runs Hard cap on pytest invocations during bisect (default: 200).
--pytest-arg Forward an arg to every pytest invocation. Repeat to pass multiple.
-v Show per-iteration progress.

Forwarding pytest options

If your project needs particular pytest options to even collect (a -p plugin, -o override, marker filter, etc.), forward them with repeated --pytest-arg:

python -m flake_bisect \
    --target tests/test_x.py::test_y \
    --pytest-arg -o --pytest-arg "addopts=" \
    --pytest-arg -m --pytest-arg "not slow"

How it works

  1. Collect all nodeids in the suite via pytest --collect-only.
  2. Sanity check 1: run the target alone; bail out if it fails (then the issue isn't ordering, it's the test itself).
  3. Sanity check 2: run […all other tests…, target] in order; bail out if the target passes (no reproducible pollution to bisect).
  4. Delta-debug the predecessor list with Zeller's ddmin. Each candidate subset is run as pytest <subset…> <target> in a fresh subprocess, with collection order pinned by a bundled internal plugin so the result doesn't depend on pytest-randomly or alphabetical ordering surprises.
  5. Report the minimal subset that still reproduces the failure plus a copy-pasteable pytest command to reproduce locally.

Determinism note: flake-bisect cannot help with flakes caused by time, threads, networking, or RNG without a fixed seed. Those are not ordering bugs. If sanity check 2 doesn't reproduce the failure deterministically, the bug is somewhere else and the CLI will say so.

Exit codes

Code Meaning
0 Bisect completed; poisoning set printed.
2 Collection problem (no tests, target nodeid not found, ...).
3 Target fails when run alone — not an ordering issue.
4 Target passes in the full ordered run — no pollution reproduced.
5 --max-runs budget exhausted.

These are stable; CI can branch on them.

Demo

The examples/polluting_demo/ directory contains a six-test suite with one polluter and one target. Use it to verify the tool runs in your environment:

python -m flake_bisect \
    --workdir examples/polluting_demo \
    --target test_target.py::test_assumes_clean_env

You should see test_pollute.py::test_sets_env_flag named as the culprit.

Background reading

Deeper material on flaky tests, pytest internals, isolation patterns, and the delta-debugging algorithm lives at python-testing-debugging.com.

License

MIT

Releases

No releases published

Packages

 
 
 

Contributors

Languages