Kernel thread stop reap blocked#39
Draft
andreykarpenko-qc wants to merge 5 commits into
Draft
Conversation
Returning from main (or calling exit()) needs to tear down the entire VM, not just the calling thread. Previously sys_exit() funneled into pthread_exit, which left peer threads — especially ones blocked in a futex (H2K_STATUS_BLOCKED) or intpool wait (H2K_STATUS_INTBLOCKED) — stuck forever, leaking the vmblock. Introduce a new H2_TRAP_VM_STOP (#7) syscall vector dispatched to H2K_vm_stop, and route sys_exit through h2_vm_stop_trap unconditionally (any status, not just zero). H2K_vm_stop scans the vmblock's context array and reaps every non-DEAD peer regardless of state: * BLOCKED — remove from futex hash, cancel timer * INTBLOCKED — remove from intpool ring * READY / VMWAIT — remove from runlist * RUNNING peers on other HW threads — flagged via vmblock->exiting and IPI'd; resched.ref.c picks up the flag and self-reaps the context before scheduling Each reap path performs the common cleanup (ASID refcount, context clear, return to free list, num_cpus--), then the new helper H2K_vmblock_finalize_if_done_locked() signals the parent VM and frees the vmblock once num_cpus reaches zero. Track which context is "main" so future policy can hinge on it: vmblock->main_context is set by setup.ref.c for the boot VM and by create.ref.c for the first thread of every other VM. Trap-table test (kernel/event/trap/test/test.c) and the intpool h2 test (kernel/event/intpool/test_h2/test.c) are updated for the new vector and the new teardown semantics — the latter no longer needs to manually join workers since main's return now reaps them. Signed-off-by: Andrey Karpenko <andreyk@qti.qualcomm.com>
Add 36 self-contained tests under libs/posix/pthread/test_h2/ exercising
the pthread/sem/rwlock/barrier/TLS surface and the H2K_vm_stop teardown
path introduced in the previous commit. Each test is a directory with
test.c + Makefile + Makefile.inc following the existing test_h2
convention.
Register the new tests in scripts/testlist.v61 and scripts/testlist.v81
so they run as part of the unified test suite. pthread_exit_main is
checked in but commented out in both testlists pending follow-up.
Coverage groups:
* Basic API: attr_roundtrip, barrier_basic, cond_signal_broadcast,
detach_states, join_basic, join_invalid, mutex_recursive,
mutex_trylock, rwlock_readers_writers, sem_corner,
tls_keys
* exit() teardown (exercises H2K_vm_stop):
exit1_main_mutex, exit1_main_cond_wait,
exit1_worker_cond_wait, exit1_detached_worker_with_stuck
* pthread_exit-from-main (POSIX: terminate caller, keep process alive):
pthread_exit_main
* Blocked-thread reap on VM exit (workers parked in sync primitives
when main returns):
stuck_in_join, stuck_in_mutex, stuck_in_cond_wait,
stuck_in_cond_timedwait, stuck_in_sem_wait,
stuck_in_sem_timedwait, stuck_in_rwlock_rd,
stuck_in_rwlock_wr, stuck_in_barrier,
stuck_in_pthread_exit_joined
* Negative / misuse:
neg_attr_setstacksize_zero, neg_barrier_init_zero,
neg_cond_wait_no_mutex, neg_create_invalid_routine,
neg_join_self, neg_mutex_destroy_held,
neg_mutex_unlock_unowned, neg_rwlock_unlock_unheld,
neg_sem_overflow_post, neg_tls_use_after_delete
Signed-off-by: Andrey Karpenko <andreyk@qti.qualcomm.com>
Three independent fixes in the test build/run plumbing surfaced once
multiple ARCHV×TARGET variants started running in parallel and writing
into a shared in-source test directory.
makefile:
* Per-variant test_results.json rule prefixes the inner $(MAKE) with
'-' so unified-report aggregation runs even when one variant's
tests fail. The JSON file is still written by h2_test before
check-fail runs.
* h2_test reorders steps so test_report.html and test_results.json
are generated *before* the warning-grep gate; previously a stray
warning blocked report generation entirely.
scripts/Makefile.coverage:
* Per-test results.txt rule's inner $(MAKE) gets the '-' prefix so
.DELETE_ON_ERROR doesn't wipe FAIL details and abort tst — PASS/FAIL
is captured in results.txt content for check-fail and
gen_test_results.py.
* Symlink whitelist replaces the old "everything except Makefile.inc"
blacklist when populating the per-variant build dir. Without the
whitelist, every variant followed symlinks back to the source tree
and clobbered each other's *.elf / *.o / results.txt / gmon-*.out,
leaving only the last writer's outputs in the report. Whitelist
covers source extensions (Makefile, *.c/.h/.S/.s/.cpp/.cc/.py/.dat/
.cfg, tested_functions); explicit excludes drop generator outputs
whose extension matches a source extension (scenarios.h,
generated_tests.dat, threadmap.py).
scripts/Makefile.inc.test:
* Add 'test' as an alias for 'tst' and mark test/tst/all .PHONY so
they don't collide with stray files of the same name.
Signed-off-by: Andrey Karpenko <andreyk@qti.qualcomm.com>
stzahi1
reviewed
Jun 1, 2026
| puts("Joining worker 0 thread"); | ||
| pthread_join(intpool_child_1, &ret); | ||
| puts("Joining worker 1 thread"); | ||
| // stop_threads = 1; |
Contributor
There was a problem hiding this comment.
Please remove code lines which are in comments
stzahi1
requested changes
Jun 1, 2026
Per POSIX, pthread_exit from the main thread terminates only the main
thread; the process must remain alive so other threads run to
completion. Two distinct bugs together broke that; both are fixed here,
and pthread_exit_main is enabled in v61 and v81 testlists to lock the
behavior in.
kernel/thread/stop: drop is_main shortcut
H2K_thread_stop (H2_TRAP_THREAD_STOP, used by pthread_safe_death)
treated main_context exiting as a VM teardown -- calling vm_stop_locked
to reap every sibling. That collapsed pthread_exit and exit() into the
same fatal path. Only exit() routes through H2_TRAP_VM_STOP, so the
kernel can distinguish: thread_stop now does the same per-me cleanup
for main as for any worker, then nulls main_context so the existing
all-blocked reaper at the bottom of the function can finalize the VM
cleanly when remaining threads settle. exit()/sys_exit() still gets
full VM teardown via H2K_vm_stop, untouched.
libs/posix/pthread: don't queue main's TLS for deferred free
pthread_exit defers freeing the exiting thread's TCB+TLS via static
old_freeptr / old_stack_freeptr -- set so the next exiting thread frees
the previous one. The math (char *)self - elftls_size only works for
worker threads (calloc'd by pthread_create with TCB at
tmpptr + elftls_size). For main:
* Small ELF TLS path: TCB lives in mainthread_static_storage (BSS).
Freeing into BSS is undefined.
* Large ELF TLS path: TCB sits at malloc'd + alignment correction --
the math gives main_thread_tls, not the malloc return.
Use aligned_alloc(TLS_ALIGN, round_up(size, TLS_ALIGN)) in the
large-TLS path so the math is correct (no manual alignment fixup), and
track main's TCB via a file-scope main_tcb pointer. pthread_exit on
main consumes the previously queued frees but skips queueing itself,
so neither static storage nor a non-malloc'd interior pointer ever
reaches free().
scripts/testlist.v{61,81}: enable pthread_exit_main
The test (added in 649bddf) exercises main calling pthread_exit with
a mix of joinable and detached worker threads. With both fixes above,
it passes under archsim and hexagon-sim.
Signed-off-by: Andrey Karpenko <andreyk@qti.qualcomm.com>
Mirrors pthread_exit_main except no worker calls exit(): every worker just `return NULL`s from its start routine, and the last one to bump the shared counter prints TEST PASSED before returning. Main still leaves via pthread_exit(NULL). This forces VM teardown through the all-blocked / all-dead reaper path -- nobody triggers the H2_TRAP_VM_STOP shortcut -- and locks in a clean exit status from that path. Registered in scripts/testlist.v61 and scripts/testlist.v81. Signed-off-by: Andrey Karpenko <andreyk@qti.qualcomm.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.