Skip to content

GSP heartbeat stuck at 0 since boot with S0ix power management on RTX PRO 1000 Blackwell laptop #1064

@mnencia

Description

@mnencia

NVIDIA Open GPU Kernel Modules Version

595.45.04

Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.

N/A — Blackwell (GB207GLM) only supports open kernel modules.

Operating System and Version

Debian 13 (trixie)

Kernel Release

6.18.15+deb13-amd64

Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.

  • I am running on a stable kernel release.

Hardware: GPU

NVIDIA RTX PRO 1000 Blackwell Generation Laptop GPU (GB207GLM, part 2DB8-975-A1)

Describe the bug

On a hybrid graphics laptop (Intel Arc Pro 140T iGPU + NVIDIA RTX PRO 1000 dGPU) with S0ix power management enabled, the GSP firmware heartbeat counter is stuck at 0 for the entire boot session. Every time the GPU wakes from runtime power management (D3cold→D0), the driver logs assertion failures and heartbeat timeouts.

The GPU is otherwise functional — nvidia-smi reports normal temperature/power/utilization, GNOME Shell renders on external displays via DP-MST through the NVIDIA GPU, and CUDA works. However, every power transition produces a burst of errors in dmesg.

The pattern repeats indefinitely throughout the entire uptime (observed over 30+ hours since boot).

Error pattern on each GPU wake:

NVRM: GPU0 _kgspProcessRpcEvent: Attempted to process RPC event from GPU0: 0x101a (PFM_REQ_HNDLR_STATE_SYNC_CALLBACK) during bootup without API lock
NVRM: GPU0 nvAssertFailedNoLog: Assertion failed: 0 @ kernel_gsp.c:1446
NVRM: GPU0 _kgspIsHeartbeatTimedOut: Heartbeat timed out, currentTimeMs 4239432340 heartbeat 0 heartbeatWithOffsetMs 4227261603 diff 12170737 timeout 5200
NVRM: GPU0 _kgspRpcRecvPoll: GSP RM heartbeat timed out

Key observations:

  • heartbeat is always 0 — it never increments from boot
  • The PFM_REQ_HNDLR_STATE_SYNC_CALLBACK RPC event is rejected because the driver thinks it is still in "bootup" state
  • Each burst is correlated with nvidia 0000:01:00.0: Enabling HDA controller (GPU waking from power save)
  • The heartbeatWithOffsetMs value is stable within a session but the diff grows over time

Possible impact: While the GPU works for display and compute, the degraded GSP state may affect the driver's ability to recover from DP-MST hotplug events. I also have a separate issue (#1055) where DP-MST monitors fail to recover after brief signal drops from a Dell WD19DCS dock, and the broken GSP heartbeat may be a contributing factor.

To Reproduce

  1. Boot a Dell Pro Max 16 Premium (MA16250) with hybrid graphics (Intel iGPU primary, NVIDIA dGPU for external displays via Direct Output Mode)
  2. S0ix power management is enabled via module parameters
  3. Let the GPU enter and exit power save naturally (it happens within minutes of idle)
  4. Check dmesg — heartbeat timeouts appear on every power transition

System Configuration

  • Laptop: Dell Pro Max 16 Premium (MA16250)
  • BIOS: 1.7.1 (Hybrid Graphics ON, Discrete Graphics Direct Output Mode ON)
  • VBIOS: 98.07.1C.40.1C
  • GSP Firmware: 595.45.04
  • iGPU: Intel Arc Pro 140T (i915 driver)
  • Desktop: GNOME 48.7 on Wayland
  • Dock: Dell WD19DCS (USB-C) — two 1920x1080 monitors via DP-MST through NVIDIA GPU
  • Module parameters:
options nvidia NVreg_TemporaryFilePath=/var/tmp
options nvidia NVreg_EnableS0ixPowerManagement=1
options nvidia NVreg_PreserveVideoMemoryAllocations=1
options nvidia-drm modeset=1

dmesg excerpts

First occurrence after boot (system booted 2026-03-16 09:10:49):

2026-03-17T01:08:26 NVRM: _kgspIsHeartbeatTimedOut: Heartbeat timed out, currentTimeMs 4132580456 heartbeat 0 heartbeatWithOffsetMs 4121941346 diff 10639110 timeout 5200
2026-03-17T01:08:26 NVRM: _kgspRpcRecvPoll: GSP RM heartbeat timed out
2026-03-17T01:09:42 NVRM: GPU0 _kgspProcessRpcEvent: Attempted to process RPC event from GPU0: 0x101a (PFM_REQ_HNDLR_STATE_SYNC_CALLBACK) during bootup without API lock
2026-03-17T01:09:42 NVRM: GPU0 nvAssertFailedNoLog: Assertion failed: 0 @ kernel_gsp.c:1446
2026-03-17T01:09:42 NVRM: GPU0 _kgspProcessRpcEvent: Attempted to process RPC event from GPU0: 0x101a (PFM_REQ_HNDLR_STATE_SYNC_CALLBACK) during bootup without API lock
2026-03-17T01:09:42 NVRM: GPU0 nvAssertFailedNoLog: Assertion failed: 0 @ kernel_gsp.c:1446
2026-03-17T01:09:42 NVRM: GPU0 _kgspIsHeartbeatTimedOut: Heartbeat timed out, currentTimeMs 4132656633 heartbeat 0 heartbeatWithOffsetMs 4121941346 diff 10715287 timeout 5200
2026-03-17T01:09:42 NVRM: GPU0 _kgspRpcRecvPoll: GSP RM heartbeat timed out

Still occurring 30 hours later:

2026-03-18T06:49:18 NVRM: GPU0 nvAssertFailedNoLog: Assertion failed: 0 @ kernel_gsp.c:1446
2026-03-18T06:49:18 NVRM: GPU0 _kgspIsHeartbeatTimedOut: Heartbeat timed out, currentTimeMs 4239432340 heartbeat 0 heartbeatWithOffsetMs 4227261603 diff 12170737 timeout 5200
2026-03-18T06:49:18 NVRM: GPU0 _kgspRpcRecvPoll: GSP RM heartbeat timed out
2026-03-18T06:51:45 nvidia 0000:01:00.0: Enabling HDA controller
2026-03-18T06:51:45 NVRM: GPU0 _kgspProcessRpcEvent: Attempted to process RPC event from GPU0: 0x101a (PFM_REQ_HNDLR_STATE_SYNC_CALLBACK) during bootup without API lock
2026-03-18T06:51:45 NVRM: GPU0 nvAssertFailedNoLog: Assertion failed: 0 @ kernel_gsp.c:1446
2026-03-18T06:51:45 NVRM: GPU0 _kgspProcessRpcEvent: Attempted to process RPC event from GPU0: 0x101a (PFM_REQ_HNDLR_STATE_SYNC_CALLBACK) during bootup without API lock
2026-03-18T06:51:45 NVRM: GPU0 nvAssertFailedNoLog: Assertion failed: 0 @ kernel_gsp.c:1446
2026-03-18T06:51:45 NVRM: GPU0 _kgspIsHeartbeatTimedOut: Heartbeat timed out, currentTimeMs 4239579024 heartbeat 0 heartbeatWithOffsetMs 4227261603 diff 12317421 timeout 5200
2026-03-18T06:51:45 NVRM: GPU0 _kgspRpcRecvPoll: GSP RM heartbeat timed out

Bug Incidence

Always (every boot, every power transition)

nvidia-bug-report.log.gz

Will upload in a follow-up comment.

More Info

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 595.45.04              Driver Version: 595.45.04      CUDA Version: 13.2     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA RTX PRO 1000 Blac...    On  |   00000000:01:00.0  On |                  N/A |
| N/A   42C    P8              6W /   35W |     101MiB /   8151MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions