Skip to content

Kernel thread stop reap blocked#39

Draft
andreykarpenko-qc wants to merge 5 commits into
masterfrom
kernel_thread_stop_reap_blocked
Draft

Kernel thread stop reap blocked#39
andreykarpenko-qc wants to merge 5 commits into
masterfrom
kernel_thread_stop_reap_blocked

Conversation

@andreykarpenko-qc
Copy link
Copy Markdown
Contributor

No description provided.

Returning from main (or calling exit()) needs to tear down the entire VM,
not just the calling thread.  Previously sys_exit() funneled into
pthread_exit, which left peer threads — especially ones blocked in a
futex (H2K_STATUS_BLOCKED) or intpool wait (H2K_STATUS_INTBLOCKED) —
stuck forever, leaking the vmblock.

Introduce a new H2_TRAP_VM_STOP (#7) syscall vector dispatched to
H2K_vm_stop, and route sys_exit through h2_vm_stop_trap unconditionally
(any status, not just zero).  H2K_vm_stop scans the vmblock's context
array and reaps every non-DEAD peer regardless of state:

  * BLOCKED  — remove from futex hash, cancel timer
  * INTBLOCKED — remove from intpool ring
  * READY / VMWAIT — remove from runlist
  * RUNNING peers on other HW threads — flagged via vmblock->exiting
    and IPI'd; resched.ref.c picks up the flag and self-reaps the
    context before scheduling

Each reap path performs the common cleanup (ASID refcount,
context clear, return to free list, num_cpus--), then the new helper
H2K_vmblock_finalize_if_done_locked() signals the parent VM and frees
the vmblock once num_cpus reaches zero.

Track which context is "main" so future policy can hinge on it:
vmblock->main_context is set by setup.ref.c for the boot VM and by
create.ref.c for the first thread of every other VM.

Trap-table test (kernel/event/trap/test/test.c) and the intpool h2 test
(kernel/event/intpool/test_h2/test.c) are updated for the new vector
and the new teardown semantics — the latter no longer needs to manually
join workers since main's return now reaps them.

Signed-off-by: Andrey Karpenko <andreyk@qti.qualcomm.com>
Add 36 self-contained tests under libs/posix/pthread/test_h2/ exercising
the pthread/sem/rwlock/barrier/TLS surface and the H2K_vm_stop teardown
path introduced in the previous commit.  Each test is a directory with
test.c + Makefile + Makefile.inc following the existing test_h2
convention.

Register the new tests in scripts/testlist.v61 and scripts/testlist.v81
so they run as part of the unified test suite.  pthread_exit_main is
checked in but commented out in both testlists pending follow-up.

Coverage groups:

  * Basic API:  attr_roundtrip, barrier_basic, cond_signal_broadcast,
                detach_states, join_basic, join_invalid, mutex_recursive,
                mutex_trylock, rwlock_readers_writers, sem_corner,
                tls_keys

  * exit() teardown (exercises H2K_vm_stop):
                exit1_main_mutex, exit1_main_cond_wait,
                exit1_worker_cond_wait, exit1_detached_worker_with_stuck

  * pthread_exit-from-main (POSIX: terminate caller, keep process alive):
                pthread_exit_main

  * Blocked-thread reap on VM exit (workers parked in sync primitives
    when main returns):
                stuck_in_join, stuck_in_mutex, stuck_in_cond_wait,
                stuck_in_cond_timedwait, stuck_in_sem_wait,
                stuck_in_sem_timedwait, stuck_in_rwlock_rd,
                stuck_in_rwlock_wr, stuck_in_barrier,
                stuck_in_pthread_exit_joined

  * Negative / misuse:
                neg_attr_setstacksize_zero, neg_barrier_init_zero,
                neg_cond_wait_no_mutex, neg_create_invalid_routine,
                neg_join_self, neg_mutex_destroy_held,
                neg_mutex_unlock_unowned, neg_rwlock_unlock_unheld,
                neg_sem_overflow_post, neg_tls_use_after_delete

Signed-off-by: Andrey Karpenko <andreyk@qti.qualcomm.com>
Three independent fixes in the test build/run plumbing surfaced once
multiple ARCHV×TARGET variants started running in parallel and writing
into a shared in-source test directory.

makefile:
  * Per-variant test_results.json rule prefixes the inner $(MAKE) with
    '-' so unified-report aggregation runs even when one variant's
    tests fail.  The JSON file is still written by h2_test before
    check-fail runs.
  * h2_test reorders steps so test_report.html and test_results.json
    are generated *before* the warning-grep gate; previously a stray
    warning blocked report generation entirely.

scripts/Makefile.coverage:
  * Per-test results.txt rule's inner $(MAKE) gets the '-' prefix so
    .DELETE_ON_ERROR doesn't wipe FAIL details and abort tst — PASS/FAIL
    is captured in results.txt content for check-fail and
    gen_test_results.py.
  * Symlink whitelist replaces the old "everything except Makefile.inc"
    blacklist when populating the per-variant build dir.  Without the
    whitelist, every variant followed symlinks back to the source tree
    and clobbered each other's *.elf / *.o / results.txt / gmon-*.out,
    leaving only the last writer's outputs in the report.  Whitelist
    covers source extensions (Makefile, *.c/.h/.S/.s/.cpp/.cc/.py/.dat/
    .cfg, tested_functions); explicit excludes drop generator outputs
    whose extension matches a source extension (scenarios.h,
    generated_tests.dat, threadmap.py).

scripts/Makefile.inc.test:
  * Add 'test' as an alias for 'tst' and mark test/tst/all .PHONY so
    they don't collide with stray files of the same name.

Signed-off-by: Andrey Karpenko <andreyk@qti.qualcomm.com>
puts("Joining worker 0 thread");
pthread_join(intpool_child_1, &ret);
puts("Joining worker 1 thread");
// stop_threads = 1;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove code lines which are in comments

Per POSIX, pthread_exit from the main thread terminates only the main
thread; the process must remain alive so other threads run to
completion.  Two distinct bugs together broke that; both are fixed here,
and pthread_exit_main is enabled in v61 and v81 testlists to lock the
behavior in.

kernel/thread/stop: drop is_main shortcut

H2K_thread_stop (H2_TRAP_THREAD_STOP, used by pthread_safe_death)
treated main_context exiting as a VM teardown -- calling vm_stop_locked
to reap every sibling.  That collapsed pthread_exit and exit() into the
same fatal path.  Only exit() routes through H2_TRAP_VM_STOP, so the
kernel can distinguish: thread_stop now does the same per-me cleanup
for main as for any worker, then nulls main_context so the existing
all-blocked reaper at the bottom of the function can finalize the VM
cleanly when remaining threads settle.  exit()/sys_exit() still gets
full VM teardown via H2K_vm_stop, untouched.

libs/posix/pthread: don't queue main's TLS for deferred free

pthread_exit defers freeing the exiting thread's TCB+TLS via static
old_freeptr / old_stack_freeptr -- set so the next exiting thread frees
the previous one.  The math (char *)self - elftls_size only works for
worker threads (calloc'd by pthread_create with TCB at
tmpptr + elftls_size).  For main:

  * Small ELF TLS path: TCB lives in mainthread_static_storage (BSS).
    Freeing into BSS is undefined.
  * Large ELF TLS path: TCB sits at malloc'd + alignment correction --
    the math gives main_thread_tls, not the malloc return.

Use aligned_alloc(TLS_ALIGN, round_up(size, TLS_ALIGN)) in the
large-TLS path so the math is correct (no manual alignment fixup), and
track main's TCB via a file-scope main_tcb pointer.  pthread_exit on
main consumes the previously queued frees but skips queueing itself,
so neither static storage nor a non-malloc'd interior pointer ever
reaches free().

scripts/testlist.v{61,81}: enable pthread_exit_main

The test (added in 649bddf) exercises main calling pthread_exit with
a mix of joinable and detached worker threads.  With both fixes above,
it passes under archsim and hexagon-sim.

Signed-off-by: Andrey Karpenko <andreyk@qti.qualcomm.com>
Mirrors pthread_exit_main except no worker calls exit(): every worker
just `return NULL`s from its start routine, and the last one to bump
the shared counter prints TEST PASSED before returning.  Main still
leaves via pthread_exit(NULL).  This forces VM teardown through the
all-blocked / all-dead reaper path -- nobody triggers the
H2_TRAP_VM_STOP shortcut -- and locks in a clean exit status from that
path.  Registered in scripts/testlist.v61 and scripts/testlist.v81.


Signed-off-by: Andrey Karpenko <andreyk@qti.qualcomm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants