Skip to content

Bluetooth: hci_qca: Fix missing wakeup during SSR memdump handling#577

Open
shuaz-shuai wants to merge 1 commit into
qualcomm-linux:qcom-6.18.yfrom
shuaz-shuai:wake_ssr_timer
Open

Bluetooth: hci_qca: Fix missing wakeup during SSR memdump handling#577
shuaz-shuai wants to merge 1 commit into
qualcomm-linux:qcom-6.18.yfrom
shuaz-shuai:wake_ssr_timer

Conversation

@shuaz-shuai
Copy link
Copy Markdown

When a Bluetooth controller encounters a coredump, it triggers the Subsystem Restart (SSR) mechanism. The controller first reports the coredump data and, once the upload is complete, sends a hw_error event. The host relies on this event to proceed with subsequent recovery actions.

If the host has not finished processing the coredump data when the hw_error event is received, it waits until either the processing is complete or the 8-second timeout expires before handling the event.

The current implementation clears QCA_MEMDUMP_COLLECTION using clear_bit(), which does not wake up waiters sleeping in wait_on_bit_timeout(). As a result, the waiting thread may remain blocked until the timeout expires even if the coredump collection has already completed.

Fix this by clearing QCA_MEMDUMP_COLLECTION with
clear_and_wake_up_bit(), which also wakes up the waiting thread and allows the hw_error handling to proceed immediately.

Test case:

  • Trigger a controller coredump using: hcitool cmd 0x3f 0c 26
  • Tested on QCA6390.
  • Capture HCI logs using btmon.
  • Verify that the delay between receiving the hw_error event and initiating the power-off sequence is reduced compared to the timeout-based behavior.

Reviewed-by: Bartosz Golaszewski bartosz.golaszewski@oss.qualcomm.com
Reviewed-by: Paul Menzel pmenzel@molgen.mpg.de
Link: https://lore.kernel.org/all/20260410095443.4167332-1-shuai.zhang@oss.qualcomm.com/

CRs-Fixed: 4498534

When a Bluetooth controller encounters a coredump, it triggers the
Subsystem Restart (SSR) mechanism. The controller first reports the
coredump data and, once the upload is complete, sends a hw_error
event. The host relies on this event to proceed with subsequent
recovery actions.

If the host has not finished processing the coredump data when the
hw_error event is received, it waits until either the processing is
complete or the 8-second timeout expires before handling the event.

The current implementation clears QCA_MEMDUMP_COLLECTION using
clear_bit(), which does not wake up waiters sleeping in
wait_on_bit_timeout(). As a result, the waiting thread may remain
blocked until the timeout expires even if the coredump collection
has already completed.

Fix this by clearing QCA_MEMDUMP_COLLECTION with
clear_and_wake_up_bit(), which also wakes up the waiting thread and
allows the hw_error handling to proceed immediately.

Test case:
- Trigger a controller coredump using:
    hcitool cmd 0x3f 0c 26
- Tested on QCA6390.
- Capture HCI logs using btmon.
- Verify that the delay between receiving the hw_error event and
  initiating the power-off sequence is reduced compared to the
  timeout-based behavior.

Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Link: https://lore.kernel.org/stable/20251107033924.3707495-2-quic_shuaz%40quicinc.com
Signed-off-by: Shuai Zhang <shuai.zhang@oss.qualcomm.com>
@shuaz-shuai shuaz-shuai requested review from a team, jingyiwang42, ndechesne and yijiyang May 13, 2026 02:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant