Skip to content

Use Base.Semaphore to control test execution parallelism#119

Open
giordano wants to merge 4 commits intomainfrom
mg/semaphore
Open

Use Base.Semaphore to control test execution parallelism#119
giordano wants to merge 4 commits intomainfrom
mg/semaphore

Conversation

@giordano
Copy link
Collaborator

This is a refactoring of the code, to then enable #77 in a follow up PR (CC @christiangnrd).

The idea of using a semaphore is mine, initial implementation is from Claude, but I made a few manual refinements afterwards (some of them folded in the first commit). Overall summary of the changes:

Replace the fixed worker-task-per-slot model with a semaphore-based
approach: one task per test, with a Base.Semaphore(jobs) limiting
concurrency and a Channel-based worker pool for reuse. This decouples
the number of tasks from the parallelism level and simplifies the
control flow (no inner while loop, tests array is immutable).

I'm mostly happy about the result, it's very close to what I had in mind, and the net diff is relatively small (+48/-22), so this shouldn't be too hard to review.

#118 was useful because it detected that the case of no tests to run wasn't handled correctly, so yay for the extra tests.

giordano and others added 4 commits March 25, 2026 18:49
Replace the fixed worker-task-per-slot model with a semaphore-based
approach: one task per test, with a Base.Semaphore(jobs) limiting
concurrency and a Channel-based worker pool for reuse. This decouples
the number of tasks from the parallelism level and simplifies the
control flow (no inner while loop, tests array is immutable).

Co-authored-by: Claude <noreply@anthropic.com>
Made-with: Cursor
@giordano giordano requested review from maleadt and vchuravy March 26, 2026 01:19
Comment on lines -1020 to +1027
for p in workers
tests_to_start = Threads.Atomic{Int}(length(tests))
for test in tests
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The core change I was interested in was changing this for loop from an iteration over the workers, to an iteration over the tests: the idea for #77 is to have different semaphores for different subsets of tests, and here we iterate over the different subsets (and associated semaphores). The whole execution cycle would then require only this change on this single line, the rest would remain as is.

I still need to fully work out the user-facing changes for designating the serial tests for #77. I have some bad ideas in mind, but I want to decouple them from this refactoring, which I believe is still worth on its own right for making the code easier to follow.

@giordano
Copy link
Collaborator Author

giordano commented Mar 26, 2026

Uhm, the test failure

default workers stopped at end: Test Failed at /Users/runner/work/ParallelTestRunner.jl/ParallelTestRunner.jl/test/runtests.jl:472
  Expression: after == before
   Evaluated: 1 == 2
  Stacktrace:
   [1] top-level scope
     @ ~/work/ParallelTestRunner.jl/ParallelTestRunner.jl/test/runtests.jl:10
   [2] macro expansion
     @ ~/hostedtoolcache/julia/nightly/aarch64/share/julia/stdlib/v1.14/Test/src/Test.jl:2243 [inlined]
   [3] macro expansion
     @ ~/work/ParallelTestRunner.jl/ParallelTestRunner.jl/test/runtests.jl:424 [inlined]
   [4] macro expansion
     @ ~/hostedtoolcache/julia/nightly/aarch64/share/julia/stdlib/v1.14/Test/src/Test.jl:2243 [inlined]
   [5] macro expansion
     @ ~/work/ParallelTestRunner.jl/ParallelTestRunner.jl/test/runtests.jl:472 [inlined]
   [6] macro expansion
     @ ~/hostedtoolcache/julia/nightly/aarch64/share/julia/stdlib/v1.14/Test/src/Test.jl:781 [inlined]

is interesting:

  1. it happened only on macOS with Julia nightly, but that's precisely what I use locally, and tests passed for me several times 😅
  2. the fact that there were fewer subprocesses still alive after the tests finished than before is quite surprising: if something went wrong with the termination of the subprocesses I'd expect more processes to be still around, not less.

The error disappeared after a rerun, I'd say it was a glitch? 😬 Edit: tests on all platforms always passed in the following 3 full reruns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants