Fix #897: gate AICPU scheduler timeout latch on owned RUNNING task + wall-clock budget by hw-native-sys-bot · Pull Request #930 · hw-native-sys/simpler

hw-native-sys-bot · 2026-05-31T04:08:49Z

Fixes #897.

Summary

Replace per-thread iteration-count fatal timeout (>= MAX_IDLE_ITERATIONS) with a wall-clock budget (SCHEDULER_TIMEOUT_MS = 5000). Iteration count is per-thread spin-speed-sensitive — a pure-idle thread hits the cap ~2× faster than a sibling that polls a RUNNING task — so the fastest spinner racing ahead is what drove [Bug] Idle scheduler thread independently latches PTO2_ERROR_SCHEDULER_TIMEOUT, causing fatal cascade in distributed runs #897's false-positive cascade.
Gate the call to handle_timeout_exit on self_owns_running_task(thread_idx). A thread without an owned RUNNING task has no first-hand evidence its dispatch is stuck. If the wall-clock budget elapses on such a thread, refresh last_progress_ts and keep spinning; the existing STALL diagnostic still fires periodically, so observability is preserved.
Symmetric across src/{a2a3,a5}/runtime/tensormap_and_ringbuffer/runtime/scheduler/.

Verification (on shared a2a3 runner via task-submit, locked NPU pair, identical env)

Repro = pypto's tests/st/distributed/test_l3_allreduce.py, the case named in #897, with the env vars #897 lists as triggers:

PTO2_RING_HEAP=4294967296
PTO2_RING_TASK_WINDOW=131072
PTO2_RING_DEP_POOL=131072

	pass	fail	`sched_error_code=100` count
baseline @ upstream/main (`39c0445`)	15	15	15
this PR	29	1	0

Fail rate 50% → 3.3%
The remaining 1/30 fail surfaces as 507018 via the host-side aclrtSynchronizeStreamWithTimeout(2 s) — that's the next outer hang detector down the stack (PLATFORM_STREAM_SYNC_TIMEOUT_MS=2000), independent of the scheduler-side latch this PR addresses. A follow-up may want to extend the host timeout for the same skew-tolerance reason; out of scope here.
0 occurrences of sched_error_code=100 across 30 rounds with the fix; this is the exact signature [Bug] Idle scheduler thread independently latches PTO2_ERROR_SCHEDULER_TIMEOUT, causing fatal cascade in distributed runs #897 documented.

Why these two changes specifically (not the alternatives in #897 body)

#897's author proposed three options (skip-latch-if-idle / barrier counter / wall-clock budget). The minimal correct fix is both the idle gate AND the wall-clock budget — independently, neither is sufficient:

Idle gate alone: the thread that owns the RUNNING task would still latch on its own iteration count, which is slower than the idle spinners by ~2× but still tripping on legitimate HCCL waits > 92 ms (× per-thread spin rate). False positives just move from idle threads to polling threads.
Wall-clock alone: would correctly delay the fatal but still allow any thread to latch on no first-hand evidence. The 5 s budget still expires while a peer's HCCL wait is in flight.

Together: only a thread that both observed the global wall-clock budget elapse and owns a RUNNING task it has been polling will latch. The barrier-counter option (proposed #2 in the issue body) was not adopted — adding shared atomics on the fatal path is heavier than the gate-and-poll structure here, and the semantics around "all threads at idle limit" is harder to reason about during normal completion vs. true hang.

Out of scope

The remaining host-side 2 s aclrtSyncStreamWithTimeout timeout (the 1/30 in the verify column above). That's a separate skew-tolerance bug at the outer layer.
PR #925 (drop 0xDEADBEEF Ctrl placeholder + retry halMemCtl EACCES race) handled a different 507046 path on dev=11 specifically; unrelated to [Bug] Idle scheduler thread independently latches PTO2_ERROR_SCHEDULER_TIMEOUT, causing fatal cascade in distributed runs #897.

Test plan

CI green
Manual: 30× test_l3_allreduce with [Bug] Idle scheduler thread independently latches PTO2_ERROR_SCHEDULER_TIMEOUT, causing fatal cascade in distributed runs #897 trigger env → 29 pass / 1 fail with zero sched_error_code=100 (vs. baseline 15/15 fail with 15 latch events)
Watch CI flake rate over the next 1-2 weeks for sched_error_code=100 occurrences — expect to drop to zero

🤖 Generated with Claude Code

Update: residual 1/30 also fixed by companion PR #935

After this PR, the same repro shows 29/30 pass; the 1 remaining fail is a host-side aclrtSyncStream(2 s) timeout — the outer timer in the same #897 cascade, exposed once the inner scheduler latch is suppressed. Root cause: ChipWorker.init skew across chip_processes bleeds peer-rank init time into per-rank op sync budget.

#935 adds a cross-chip init barrier (MailboxState::INIT_DONE + parent spin-wait) that eliminates the skew at source. With #930 + #935, the same 30-round repro shows 30/30 pass, zero sched_error_code=100, zero host stream sync timeouts.

Two PRs because the layers are genuinely independent:

This PR is correctness in the AICPU scheduler hot loop (idle thread should not declare global fatal on its own observation, independent of init skew).
Fix #897: cross-chip init barrier so peer-rank init skew never charges against per-rank stream sync budget #935 is an architectural fix on the host startup path (cross-rank init must complete before any per-op timer starts).

Both should land. Order doesn't matter; no file overlap.

Update: CodeRabbit feedback addressed (commit `008b787`)

Concern A (deadlock detection regression) — addressed. Added a second latch branch in scheduler_dispatch.cpp: if the wall-clock budget elapses and no_thread_owns_running_task() (new helper) and completed_tasks_ < total_tasks_, latch fatal. Without this, a dependency-cycle deadlock with zero RUNNING work would spin forever — every ownerless thread would just refresh its timestamp and never latch. Re-verified: same [Bug] Idle scheduler thread independently latches PTO2_ERROR_SCHEDULER_TIMEOUT, causing fatal cascade in distributed runs #897 30-round repro now passes 30/30 (was 29/30 before, the residual was independent host-side 2 s sync; cleared by composing with Fix #897: cross-chip init barrier so peer-rank init skew never charges against per-rank stream sync budget #935).
Concern B (PTO2_SCHED_TIMEOUT_MS env knob) — skipped. The 5 s default validates 30/30 on the [Bug] Idle scheduler thread independently latches PTO2_ERROR_SCHEDULER_TIMEOUT, causing fatal cascade in distributed runs #897 repro (combined with Fix #897: cross-chip init barrier so peer-rank init skew never charges against per-rank stream sync budget #935) — no evidence the budget needs to differ per-fabric. Per .claude/rules/discipline.md and CLAUDE.md, adding a config surface without a workload that requires it is premature. Easy follow-up if a future fabric trips the default.

Update: #935 merged (2026-05-31)

The companion PR #935 (chip-init barrier) has been merged. This PR is independent (no file overlap) and lands the AICPU-side correctness fix on top. The Verification table above showed 30/30 with both fixes composed; with #935 already in main, #930 alone is sufficient to keep the #897 cascade closed in steady state and to handle any new init-skew sources that may arise later.

coderabbitai · 2026-05-31T04:09:00Z

📝 Walkthrough

Walkthrough

This PR addresses false-positive scheduler timeout crashes in distributed runs by replacing iteration-count timeouts with wall-clock timeouts and gating fatal-latch to threads that own cores with active RUNNING tasks. Both a2a3 and a5 scheduler variants receive identical changes.

Changes

Scheduler Timeout with Ownership Gating

Layer / File(s)	Summary
Timeout constants and configuration `src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_types.h`, `src/a5/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_types.h`	`SCHEDULER_TIMEOUT_MS` (5000ms) and `SCHEDULER_TIMEOUT_CYCLES` constants replace per-iteration caps, enabling wall-clock measurement to accommodate distributed startup skew.
Ownership check helper `src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_context.h`, `src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_cold_path.cpp`, `src/a5/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_context.h`, `src/a5/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_cold_path.cpp`	`self_owns_running_task(thread_idx)` method declared and implemented in both variants to check if a thread owns any core with non-null `running_slot_state`, gating the fatal condition.
Wall-clock timeout dispatch logic `src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_dispatch.cpp`, `src/a5/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_dispatch.cpp`	Per-thread `last_progress_ts` tracking replaces iteration count; cycle-budget timeout only calls `handle_timeout_exit` when `self_owns_running_task(thread_idx)` is true, otherwise refreshes timestamp and continues spinning to prevent idle peer threads from triggering false fatal crashes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

hw-native-sys/simpler#872: Extends handle_timeout_exit to accept last-progress for shutdown stall snapshots; complements the wall-clock timeout trigger changes.

Poem

🐰 A scheduler once counted its spins,
Till idle threads crashed with timeout chins.
Now wall-clock and care—
Ownership fair—
No false latch, and everyone wins! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title directly reflects the two main changes: gating AICPU scheduler timeout latch on owned RUNNING task and implementing a wall-clock budget, which align with the changeset's core modifications.
Linked Issues check	✅ Passed	The code changes fully implement the linked issue `#897` objectives: adding self_owns_running_task() gate, replacing iteration-count timeout with wall-clock budget (SCHEDULER_TIMEOUT_MS/SCHEDULER_TIMEOUT_CYCLES), and gating handle_timeout_exit on ownership checks.
Out of Scope Changes check	✅ Passed	All modifications are directly scoped to the linked issue `#897`: new helper method, timeout constant definitions, and dispatch logic changes for the gate-and-wall-clock mechanism. No unrelated refactoring or feature additions detected.
Description check	✅ Passed	The pull request description is highly detailed and directly related to the changeset, explaining the rationale for replacing iteration-count timeouts with wall-clock budgets and gating timeout latches on owned RUNNING tasks.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request replaces the per-thread iteration-count cap for declaring a scheduler timeout with a wall-clock budget of 5 seconds across the a2a3 and a5 runtimes. It introduces the self_owns_running_task helper to ensure that only threads owning a core with an active RUNNING task can trigger a global fatal timeout, preventing false-positive latches caused by startup skew or idle spinning in distributed environments. There are no review comments, and we have no additional feedback to provide.

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_dispatch.cpp`:
- Around line 766-787: The timeout suppression currently resets last_progress_ts
whenever the local thread does not own RUNNING work, which hides global
deadlocks when completed_tasks_ < total_tasks_ and no thread owns any RUNNING
slot; change the branch so that before simply refreshing last_progress_ts you
check for the global no-RUNNING condition and, if true and tasks remain, invoke
the same fatal path (handle_timeout_exit(...)) for this thread. Concretely:
inside the if (get_sys_cnt_aicpu() - last_progress_ts >
SCHEDULER_TIMEOUT_CYCLES) block, after the self_owns_running_task(thread_idx)
check fails, add a check like "if (completed_tasks_ < total_tasks_ &&
no_thread_owns_running()) return handle_timeout_exit(...)" (implement
no_thread_owns_running by scanning scheduler thread slots or using the existing
global running-count helper), otherwise keep updating last_progress_ts as
before.

In
`@src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_types.h`:
- Around line 63-65: Replace the hardcoded constexpr SCHEDULER_TIMEOUT_MS (5000)
with a configurable knob: read a compile-time macro PTO2_SCHED_TIMEOUT_MS (fall
back to 5000 if undefined) or a runtime-config value and compute
SCHEDULER_TIMEOUT_CYCLES from that value (using PLATFORM_PROF_SYS_CNT_FREQ as
before); update uses of SCHEDULER_TIMEOUT_MS and SCHEDULER_TIMEOUT_CYCLES so
they reference the new configurable symbol (e.g., PTO2_SCHED_TIMEOUT_MS or a
getter like get_scheduler_timeout_ms()) and ensure the multiplication still
casts to uint64_t to avoid overflow.

In
`@src/a5/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_dispatch.cpp`:
- Around line 762-783: The timeout reset currently suppresses fatal timeouts
when no thread owns a RUNNING slot, which hides true deadlocks; update the
timeout branch so that after detecting get_sys_cnt_aicpu() - last_progress_ts >
SCHEDULER_TIMEOUT_CYCLES you still call handle_timeout_exit(...) when there are
unfinished tasks and no scheduler thread owns any RUNNING slot. Concretely: keep
the existing self_owns_running_task(thread_idx) -> handle_timeout_exit(...)
path, add an else-if that checks (completed_tasks_ < total_tasks_) and that no
thread owns a RUNNING slot (implement by scanning the scheduler state or adding
a helper like any_thread_owns_running() if none exists), and call
handle_timeout_exit(...) in that case; only when neither condition holds should
you refresh last_progress_ts = get_sys_cnt_aicpu(). Ensure you use the same
handle_timeout_exit signature and existing symbols (last_progress_ts,
SCHEDULER_TIMEOUT_CYCLES, get_sys_cnt_aicpu(), completed_tasks_, total_tasks_,
self_owns_running_task(thread_idx)).

In `@src/a5/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_types.h`:
- Around line 64-66: The timeout is hardcoded via constexpr SCHEDULER_TIMEOUT_MS
and SCHEDULER_TIMEOUT_CYCLES; change these to be computed at runtime and
configurable (e.g., via environment variable or a runtime config API) with a
sensible default of 5000 ms. Replace constexpr SCHEDULER_TIMEOUT_MS with a
function or non-constexpr getter (e.g., getSchedulerTimeoutMs()) that reads a
config/env var and falls back to 5000, and recompute SCHEDULER_TIMEOUT_CYCLES
from that runtime value using PLATFORM_PROF_SYS_CNT_FREQ (e.g.,
getSchedulerTimeoutCycles() = uint64_t(getSchedulerTimeoutMs()) *
(PLATFORM_PROF_SYS_CNT_FREQ / 1000)). Ensure callers use the new getter(s)
instead of the removed constexpr symbols.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 89487ddc-5ded-4b95-8ece-70ff7393e82e

📥 Commits

Reviewing files that changed from the base of the PR and between 39c0445 and 2bf0d35.

📒 Files selected for processing (8)

src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_cold_path.cpp
src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_context.h
src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_dispatch.cpp
src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_types.h
src/a5/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_cold_path.cpp
src/a5/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_context.h
src/a5/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_dispatch.cpp
src/a5/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_types.h

…s against the per-rank stream sync budget (#935) ChipWorker.init has a long right tail under PTO2_RING_HEAP=4 GiB (per-rank device_malloc(16 GiB) pulls host long-tail into the critical path). Without coordination, a fast-init chip starts its first launch_aicore_kernel + aclrtSyncStream(PLATFORM_STREAM_SYNC_TIMEOUT_MS) window N seconds before a slower peer reaches the same point. Any cross-rank wait inside the dispatched op (HCCL notify, distributed allreduce, etc.) charges the slow peer's remaining init time against the fast peer's 2 s host stream sync budget — the cascade observed as the residual 1/30 fail after PR #930. Add a new MailboxState::INIT_DONE (value 6). Chip child writes it once ChipWorker.init returns; parent's _start_hierarchical spin-waits for every chip child to reach INIT_DONE before dispatching anything (CTRL_PREPARE or TASK_READY), then resets each child to IDLE so the standard dispatch state machine resumes from the canonical "ready for work" state. Effect: all chip childrens' first aclrtSyncStream windows open at the same wall-clock instant, so the 2 s budget covers only actual op execution rather than absorbing peer init skew. ## Verification Same task-submit + locked-NPU + 30-round protocol as PR #930, on pypto's tests/st/distributed/test_l3_allreduce.py with the issue #897 trigger env (PTO2_RING_HEAP=4 GiB, PTO2_RING_TASK_WINDOW=131072, PTO2_RING_DEP_POOL=131072). | variant | pass | fail | sched_error_code=100 | host stream sync timeout | | -------------------- | ---- | ---- | -------------------- | ------------------------ | | upstream/main | 15 | 15 | 15 | (subsumed) | | PR #930 alone | 29 | 1 | 0 | 1 | | PR #930 + this PR | 30 | 0 | 0 | 0 | Composes with PR #930 — they address different layers of the same distributed-init-skew failure mode: - PR #930 eliminates the false-positive AICPU scheduler latch (`sched_error_code=100`) that fires from any idle thread at ~92 ms. - This PR eliminates the underlying skew so the outer host-side timer (`PLATFORM_STREAM_SYNC_TIMEOUT_MS = 2000`) also has no input event to fire on. Either fix alone leaves a residual; both together drop the failure rate to 0 under the issue's reproduction conditions. ## Scope The same `INIT_DONE` signaling could be wired into `_sub_worker_loop` and `_child_worker_loop` (L4+) for symmetry. Skipped here because sub workers do no device init (Python-only loops) and the L4+ inner Worker race is unobserved; addressing it would be a follow-up keyed to its own repro. Co-authored-by: Chao Wang <26245345+ChaoWao@users.noreply.github.com>

…NNING task + wall-clock budget + deadlock fallback Issue hw-native-sys#897 documents an idle-thread false-latch in the AICPU scheduler: any thread reaching MAX_IDLE_ITERATIONS (~92 ms wall at ~115 ns/iter) unconditionally writes sched_error_code=PTO2_ERROR_SCHEDULER_TIMEOUT (100) to the shared header, which sibling threads see in handle_orchestrator_exit and abort. In distributed runs this routinely kills the runtime even when the only thread with a RUNNING task is making correct progress, because a peer rank's host-side device_malloc tail-latency easily exceeds 92 ms. Three changes, applied symmetrically to a2a3 and a5 tensormap_and_ringbuffer: 1. Replace the per-thread iteration-count timeout with a wall-clock budget (SCHEDULER_TIMEOUT_MS=5000 → SCHEDULER_TIMEOUT_CYCLES via the existing 50 MHz sys-cnt frequency). Iteration counts run at per-thread spin speed: a pure-idle thread hits MAX_IDLE_ITERATIONS in ~92 ms while a sibling that polls a COND register on a RUNNING task takes ~200 ms for the same iteration count. Letting the fast spinner race ahead and declare global fatal kills the slower-but-correct poller mid-poll. Wall-clock unifies the deadline. STALL_LOG_INTERVAL still drives per-thread observability on iteration count (diagnostic cadence, not load-bearing). 2. Gate the call to handle_timeout_exit on self_owns_running_task(thread_idx). A thread without an owned RUNNING task has no first-hand evidence of a stuck dispatch and must not declare global fatal on its own idle clock. If the wall-clock budget elapses on a thread with no evidence, refresh last_progress_ts so the next budget cycle starts fresh; the STALL diagnostic above still fires periodically so observability is preserved while the false-positive latch is suppressed. The thread that does own a stuck task will reach the budget on its own polls and latch with valid evidence (or recover when the COND register flips). 3. Second latch branch for "pre-dispatch / WAIT-only deadlock" (CodeRabbit feedback): if the wall-clock budget elapses AND no thread anywhere owns a RUNNING task AND tasks remain unfinished (completed_tasks_ < total_tasks_), latch. Without this, a genuine dependency-cycle deadlock with zero RUNNING work would spin forever — every ownerless thread would just refresh its timestamp and never latch, and the host-side aclrtSyncStream timeout is the only outer detector. The new no_thread_owns_running_task() helper makes the global-stuck condition cheap to test on the cold timeout path. The 5 s budget is sized to cover worst-observed distributed-init skew (reported as ~196 ms in hw-native-sys#897 round 12; PTO2_RING_HEAP=4 GiB pushes host-side device_malloc tail latency into the same range) plus typical HCCL wait, with margin. PLATFORM_STREAM_SYNC_TIMEOUT_MS=2000 on the host side remains as the outer hang detector for cases the AICPU never reaches. ## Verification Repro on shared a2a3 runner via task-submit, locked to a clean pair of NPUs, 30 rounds each, identical env from hw-native-sys#897: PTO2_RING_HEAP=4294967296 PTO2_RING_TASK_WINDOW=131072 PTO2_RING_DEP_POOL=131072 Test: pypto's tests/st/distributed/test_l3_allreduce.py (the exact case named in hw-native-sys#897). Baseline = upstream/main (pre-fix); fix = this PR's branch. baseline 30 rounds: pass=15 fail=15 sched_error_code=100=15 this PR alone (initial): pass=29 fail=1 sched_error_code=100=0 this PR (with CodeRabbit changes): pass=30 fail=0 sched_error_code=100=0 (combined with hw-native-sys#935 chip-init barrier) The 1/30 residual from the initial version was a host-side aclrtSyncStream(2 s) timeout — the next outer timer down the stack, independent of the scheduler latch this PR fixes. Combining with hw-native-sys#935 (cross-chip init barrier) eliminates that residual at source. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

gemini-code-assist Bot reviewed May 31, 2026

View reviewed changes

coderabbitai Bot reviewed May 31, 2026

View reviewed changes

This was referenced May 31, 2026

Fix #897: cross-chip init barrier so peer-rank init skew never charges against per-rank stream sync budget #935

Merged

Fix: drop 0xDEADBEEF Ctrl regs placeholder + retry halMemCtl EACCES race #925

Merged

ChaoWao force-pushed the fix/scheduler-idle-thread-false-latch branch 2 times, most recently from 1e50009 to 008b787 Compare May 31, 2026 06:35

ChaoWao force-pushed the fix/scheduler-idle-thread-false-latch branch from 008b787 to cbcc3a1 Compare May 31, 2026 06:47

ChaoWao approved these changes May 31, 2026

View reviewed changes

ChaoWao merged commit 0309e78 into hw-native-sys:main May 31, 2026
16 checks passed

ChaoWao deleted the fix/scheduler-idle-thread-false-latch branch May 31, 2026 07:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix #897: gate AICPU scheduler timeout latch on owned RUNNING task + wall-clock budget#930

Fix #897: gate AICPU scheduler timeout latch on owned RUNNING task + wall-clock budget#930
ChaoWao merged 1 commit into
hw-native-sys:mainfrom
hw-native-sys-bot:fix/scheduler-idle-thread-false-latch

hw-native-sys-bot commented May 31, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 31, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hw-native-sys-bot commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Verification (on shared a2a3 runner via task-submit, locked NPU pair, identical env)

Why these two changes specifically (not the alternatives in #897 body)

Out of scope

Test plan

Update: residual 1/30 also fixed by companion PR #935

Related

Update: CodeRabbit feedback addressed (commit 008b787)

Update: #935 merged (2026-05-31)

Uh oh!

coderabbitai Bot commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hw-native-sys-bot commented May 31, 2026 •

edited

Loading

Update: CodeRabbit feedback addressed (commit `008b787`)

coderabbitai Bot commented May 31, 2026 •

edited

Loading