Skip to content

Conversation

@bryancall
Copy link
Contributor

@bryancall bryancall commented Feb 8, 2026

Description

Adds a parallel test runner (autest-parallel.py) that distributes autests across multiple worker processes for significantly faster execution. On a 16-core machine, test suite completion drops from ~80 minutes (sequential) to ~6 minutes.

The existing cmake --build -t autest workflow is completely untouched. This is an additive tool for developers who want faster local test runs.

Related Issues

Changes

New files

  • tests/autest-parallel.py - Parallel test runner with:
    • Port offset isolation per worker (AUTEST_PORT_OFFSET) to prevent port conflicts
    • Timing-based load balancing (LPT algorithm) using historical test durations
    • Serial test support for tests that cannot run concurrently
    • Live progress line with ETA, failure counts, and worker status
    • Verbose mode (-v) with real-time test output streaming
    • Per-test timing collection (--collect-timings) for load balancing optimization
  • tests/serial_tests.txt - List of tests requiring serial execution
  • tests/README.md - Added parallel testing documentation

Modified files

  • tests/gold_tests/autest-site/ports.py - Added AUTEST_PORT_OFFSET environment variable support for dynamic port range isolation (backward compatible, defaults to 0)
  • tests/gold_tests/basic/config.test.py - Switched from hardcoded ports to dynamic port selection
  • tests/gold_tests/basic/copy_config.test.py - Same, removed select_ports=False

Usage

cd tests
python3 autest-parallel.py -j 16   --ats-bin <install>/bin   --build-root <build-dir>   --sandbox /tmp/autest-parallel

Testing

  • Tested with 16 parallel workers on a Ryzen 9 16-core machine
  • 401 tests: 370 passed, 39 skipped (missing optional deps), 6 pre-existing failures
  • All pre-existing failures also fail when run sequentially on master
  • The ports.py change is backward compatible (offset defaults to 0, no effect on existing test runs)
  • config.test.py and copy_config.test.py changes use dynamic ports which are more robust than hardcoded values

Add autest-parallel.py which runs autest tests in parallel by spawning
multiple autest processes with isolated sandboxes and port ranges.

Key changes:
- ports.py: Add AUTEST_PORT_OFFSET environment variable support to offset
  the starting port range for each parallel worker, avoiding port conflicts
- autest-parallel.py: New script that discovers tests, partitions them
  across workers, runs them in parallel, and aggregates results

Usage:
  ./autest-parallel.py -j 8 --ats-bin /opt/ats/bin --sandbox /tmp/sb

Note: The built-in autest -j flag does not work with ATS tests (causes
"No Test run defined" failures), hence the need for this wrapper.

Tests with hardcoded ports (select_ports=False) cannot safely run in
parallel and may still fail.
…duling

- Add serial_tests.txt for tests that cannot run in parallel (hardcoded ports)
- Implement LPT (Longest Processing Time) load balancing using timing data
- Add --collect-timings flag to record per-test durations for future runs
- Fix port offset for low-range ports in ports.py (was missing offset)
- Convert config.test.py and copy_config.test.py to use dynamic ports
  (removed select_ports=False)
- Add run_single_test() for serial execution with timing
…itives

- Add --build-root CLI argument to properly locate test plugins in the
  build directory (fixes ~57 test failures from missing test plugins)
- Fix skip detection: tests skipped due to missing dependencies (lua,
  QUIC, go-httpbin, uri_signing) are now reported as SKIP, not FAIL
- Fix false-positive detection: tests that error at setup (e.g., missing
  proxy verifier) with 0 pass/0 fail are now correctly reported as FAIL
- Return PASS/FAIL/SKIP status from run_single_test() instead of bool
- Track skipped count in per-worker results and summary output
- In batch mode with -v, stream autest output in real-time using Popen
  instead of subprocess.run, showing "Running Test..." lines as they happen
- Print test list preview per worker during partitioning (-v)
- Show timestamps and skipped counts in worker completion messages
- Print first 5 test names per worker at batch start (-v)
Default output now shows a single in-place progress line (using \r)
that updates as workers complete:

  [Parallel] 145/389 tests (37%) | 8/16 workers done | 52s elapsed | ETA: 1m 28s

- Shows tests done/total with percentage
- Workers completed out of total
- Failed and skipped counts (when non-zero)
- Elapsed time and estimated time remaining
- Updates in-place for both parallel and serial phases
- Verbose mode (-v) still prints per-worker detail lines above the progress
- Progress line and summary now count top-level tests, not autest
  sub-test results. Previously thread_config (12 sub-tests) inflated
  the count, causing progress to exceed 100%.
- Deduplicate failed test names in summary output
- Remove tls_sni_with_port, redirect_to_same_origin_on_cache, and
  parent-retry from serial list -- all use dynamic ports via get_port()
  and run fine in parallel
- Add thread_config to serial list -- spins up 12 ATS instances and
  fails under parallel load due to resource contention
Document usage of autest-parallel.py including key options,
timing-based load balancing, and how to add serial tests.
@bryancall bryancall self-assigned this Feb 8, 2026
@bryancall bryancall added this to the 10.2.0 milestone Feb 8, 2026
Apply project yapf formatting rules to pass CI format check.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an optional parallel AuTest runner intended to speed up local test execution by sharding gold tests across multiple processes and isolating port ranges per worker.

Changes:

  • Added tests/autest-parallel.py to discover tests, partition them across workers (optionally using historical timing data), and run serial-only tests afterward.
  • Introduced AUTEST_PORT_OFFSET support in tests/gold_tests/autest-site/ports.py to reduce port collisions across concurrent test runs.
  • Updated two basic gold tests to rely on dynamically selected ports instead of hardcoded ports, and documented the parallel workflow in tests/README.md.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 14 comments.

Show a summary per file
File Description
tests/autest-parallel.py New parallel test runner (workers, port offsets, timing-based sharding, serial test handling, progress/ETA).
tests/gold_tests/autest-site/ports.py Adds environment-driven port range offset to reduce collisions under parallelism.
tests/gold_tests/basic/config.test.py Switches to dynamic port selection (removes hardcoded port).
tests/gold_tests/basic/copy_config.test.py Switches to dynamic port selection and removes select_ports=False.
tests/serial_tests.txt New list of tests that must run serially.
tests/README.md Documents how to run the new parallel runner and timing-based balancing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- ports.py: Add try/except for AUTEST_PORT_OFFSET parsing with range clamping
- Make --ats-bin conditionally required (not needed for --list mode)
- Add timeout handling for verbose mode subprocess.Popen.wait()
- Update load_serial_tests docstring to match actual basename behavior
- Remove unused current_test and test_start_line variables
- Add explanatory comments to exception handlers
@bryancall
Copy link
Contributor Author

Review Feedback Response

Addressed the automated review comments in commit 3e28d16. Here's what was fixed and what was intentionally left as-is:

Fixed (6 items)

# Comment Fix
1 AUTEST_PORT_OFFSET in ports.py should validate input Added try/except ValueError with warning + clamped to [0, 60000] range
2 --ats-bin is required even for --list mode which doesn't need it Changed to default=None with manual validation — only required when actually running tests
3 Verbose mode Popen uses blocking for line in proc.stdout with no timeout Added proc.wait(timeout=60) with TimeoutExpired handling after stdout is exhausted (the line-by-line read itself completes when the process closes stdout)
4 load_serial_tests docstring says "matches any test containing this" but code uses exact basename match Updated docstring to accurately describe the .stem extraction behavior
5 current_test and test_start_line are assigned but never used in parse_autest_output Removed the unused variables
6 Exception handlers (except IOError: pass, except (json.JSONDecodeError, IOError): pass) lack context Added inline comments explaining why the exceptions are intentionally silenced

Not Fixed (2 items)

# Comment Reason
1 Suggestion to use autest.sh wrapper instead of uv run autest autest.sh is the CMake-generated wrapper that calls uv run autest internally. autest-parallel.py runs independently of CMake and invokes uv run autest directly, which is the correct approach.
2 Suggestion that for line in proc.stdout blocks indefinitely This is expected behavior — the iterator completes when the subprocess closes its stdout pipe (i.e., when it exits). The 1-hour timeout on the non-verbose subprocess.run path protects against hangs there, and the new proc.wait(timeout=60) guards the post-read cleanup in verbose mode.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 12 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@bryancall
Copy link
Contributor Author

Re: Copilot's Second Round of Comments

Most of these are duplicates of the issues already addressed in commit 3e28d16 (Copilot appears to have re-reviewed the full diff before that commit landed). Addressing the net-new suggestions:

Already fixed in 3e28d16 (duplicate comments)

  • AUTEST_PORT_OFFSET validation -- done (try/except + clamp)
  • --ats-bin required for --list -- done (conditional validation)
  • load_serial_tests docstring -- done (updated to match basename behavior)
  • current_test / test_start_line unused -- done (removed)
  • except clauses without comments -- done (added comments)
  • Verbose mode timeout -- done (proc.wait(timeout=60) + kill on timeout)

6x duplicate comments on line 312 (parse_autest_output)

Six copies of the same suggestion with slightly different wording for except ValueError: pass in the summary parser. These are trivial int() parse guards on autest output lines -- if the line doesn't end with a number, we skip it. Adding a comment to each of the 6 identical except blocks would be pure noise in a parsing function where the intent is obvious from context.

"Use autest.sh instead of uv run autest" (not fixing)

autest.sh is generated by CMake at build time and lives in the build directory. autest-parallel.py is designed to work independently of the CMake build system, invoking uv run autest directly -- which is what autest.sh itself does internally. The --filters flag (plural) is the correct autest CLI; --filter (singular) is a different flag. This works correctly as-is.

"Replicate autest.sh environment setup" (not fixing)

The suggestion to duplicate PYTHONPATH and proxy env var setup from autest.sh is unnecessary. uv run handles the Python environment via pyproject.toml, and the tests pass without the extra env setup. If a future test requires it, we can add it then.

"Validate port offset against dmin/dmax at runtime" (not fixing)

The suggestion to validate the offset against the computed OS port range is over-engineering. In practice, offsets are multiples of 1000 (default --port-offset-step) with worker counts typically 4-16, so the maximum offset is ~16000. The 60000 clamp is a safe upper bound. Adding runtime validation against sysctl values would couple the parallel runner to OS-specific port range detection logic that already lives in ports.py.

"Basename matching could collide" (acknowledged, not fixing now)

Valid theoretical concern -- if two tests in different subdirectories had the same basename, the serial list would match both. In practice there are zero colliding test basenames in the current suite. If this becomes an issue, we can switch to relative-path matching. Not worth the added complexity today.

@bryancall
Copy link
Contributor Author

[approve ci autest 0]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant