-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
NVIDIA Open GPU Kernel Modules Version
595.45.04
Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.
N/A — Blackwell (GB207GLM) only supports open kernel modules.
Operating System and Version
Debian 13 (trixie)
Kernel Release
6.18.15+deb13-amd64
Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.
- I am running on a stable kernel release.
Hardware: GPU
NVIDIA RTX PRO 1000 Blackwell Generation Laptop GPU (GB207GLM, part 2DB8-975-A1)
Describe the bug
On a hybrid graphics laptop (Intel Arc Pro 140T iGPU + NVIDIA RTX PRO 1000 dGPU) with S0ix power management enabled, the GSP firmware heartbeat counter is stuck at 0 for the entire boot session. Every time the GPU wakes from runtime power management (D3cold→D0), the driver logs assertion failures and heartbeat timeouts.
The GPU is otherwise functional — nvidia-smi reports normal temperature/power/utilization, GNOME Shell renders on external displays via DP-MST through the NVIDIA GPU, and CUDA works. However, every power transition produces a burst of errors in dmesg.
The pattern repeats indefinitely throughout the entire uptime (observed over 30+ hours since boot).
Error pattern on each GPU wake:
NVRM: GPU0 _kgspProcessRpcEvent: Attempted to process RPC event from GPU0: 0x101a (PFM_REQ_HNDLR_STATE_SYNC_CALLBACK) during bootup without API lock
NVRM: GPU0 nvAssertFailedNoLog: Assertion failed: 0 @ kernel_gsp.c:1446
NVRM: GPU0 _kgspIsHeartbeatTimedOut: Heartbeat timed out, currentTimeMs 4239432340 heartbeat 0 heartbeatWithOffsetMs 4227261603 diff 12170737 timeout 5200
NVRM: GPU0 _kgspRpcRecvPoll: GSP RM heartbeat timed out
Key observations:
heartbeatis always0— it never increments from boot- The
PFM_REQ_HNDLR_STATE_SYNC_CALLBACKRPC event is rejected because the driver thinks it is still in "bootup" state - Each burst is correlated with
nvidia 0000:01:00.0: Enabling HDA controller(GPU waking from power save) - The
heartbeatWithOffsetMsvalue is stable within a session but the diff grows over time
Possible impact: While the GPU works for display and compute, the degraded GSP state may affect the driver's ability to recover from DP-MST hotplug events. I also have a separate issue (#1055) where DP-MST monitors fail to recover after brief signal drops from a Dell WD19DCS dock, and the broken GSP heartbeat may be a contributing factor.
To Reproduce
- Boot a Dell Pro Max 16 Premium (MA16250) with hybrid graphics (Intel iGPU primary, NVIDIA dGPU for external displays via Direct Output Mode)
- S0ix power management is enabled via module parameters
- Let the GPU enter and exit power save naturally (it happens within minutes of idle)
- Check dmesg — heartbeat timeouts appear on every power transition
System Configuration
- Laptop: Dell Pro Max 16 Premium (MA16250)
- BIOS: 1.7.1 (Hybrid Graphics ON, Discrete Graphics Direct Output Mode ON)
- VBIOS: 98.07.1C.40.1C
- GSP Firmware: 595.45.04
- iGPU: Intel Arc Pro 140T (i915 driver)
- Desktop: GNOME 48.7 on Wayland
- Dock: Dell WD19DCS (USB-C) — two 1920x1080 monitors via DP-MST through NVIDIA GPU
- Module parameters:
options nvidia NVreg_TemporaryFilePath=/var/tmp
options nvidia NVreg_EnableS0ixPowerManagement=1
options nvidia NVreg_PreserveVideoMemoryAllocations=1
options nvidia-drm modeset=1
dmesg excerpts
First occurrence after boot (system booted 2026-03-16 09:10:49):
2026-03-17T01:08:26 NVRM: _kgspIsHeartbeatTimedOut: Heartbeat timed out, currentTimeMs 4132580456 heartbeat 0 heartbeatWithOffsetMs 4121941346 diff 10639110 timeout 5200
2026-03-17T01:08:26 NVRM: _kgspRpcRecvPoll: GSP RM heartbeat timed out
2026-03-17T01:09:42 NVRM: GPU0 _kgspProcessRpcEvent: Attempted to process RPC event from GPU0: 0x101a (PFM_REQ_HNDLR_STATE_SYNC_CALLBACK) during bootup without API lock
2026-03-17T01:09:42 NVRM: GPU0 nvAssertFailedNoLog: Assertion failed: 0 @ kernel_gsp.c:1446
2026-03-17T01:09:42 NVRM: GPU0 _kgspProcessRpcEvent: Attempted to process RPC event from GPU0: 0x101a (PFM_REQ_HNDLR_STATE_SYNC_CALLBACK) during bootup without API lock
2026-03-17T01:09:42 NVRM: GPU0 nvAssertFailedNoLog: Assertion failed: 0 @ kernel_gsp.c:1446
2026-03-17T01:09:42 NVRM: GPU0 _kgspIsHeartbeatTimedOut: Heartbeat timed out, currentTimeMs 4132656633 heartbeat 0 heartbeatWithOffsetMs 4121941346 diff 10715287 timeout 5200
2026-03-17T01:09:42 NVRM: GPU0 _kgspRpcRecvPoll: GSP RM heartbeat timed out
Still occurring 30 hours later:
2026-03-18T06:49:18 NVRM: GPU0 nvAssertFailedNoLog: Assertion failed: 0 @ kernel_gsp.c:1446
2026-03-18T06:49:18 NVRM: GPU0 _kgspIsHeartbeatTimedOut: Heartbeat timed out, currentTimeMs 4239432340 heartbeat 0 heartbeatWithOffsetMs 4227261603 diff 12170737 timeout 5200
2026-03-18T06:49:18 NVRM: GPU0 _kgspRpcRecvPoll: GSP RM heartbeat timed out
2026-03-18T06:51:45 nvidia 0000:01:00.0: Enabling HDA controller
2026-03-18T06:51:45 NVRM: GPU0 _kgspProcessRpcEvent: Attempted to process RPC event from GPU0: 0x101a (PFM_REQ_HNDLR_STATE_SYNC_CALLBACK) during bootup without API lock
2026-03-18T06:51:45 NVRM: GPU0 nvAssertFailedNoLog: Assertion failed: 0 @ kernel_gsp.c:1446
2026-03-18T06:51:45 NVRM: GPU0 _kgspProcessRpcEvent: Attempted to process RPC event from GPU0: 0x101a (PFM_REQ_HNDLR_STATE_SYNC_CALLBACK) during bootup without API lock
2026-03-18T06:51:45 NVRM: GPU0 nvAssertFailedNoLog: Assertion failed: 0 @ kernel_gsp.c:1446
2026-03-18T06:51:45 NVRM: GPU0 _kgspIsHeartbeatTimedOut: Heartbeat timed out, currentTimeMs 4239579024 heartbeat 0 heartbeatWithOffsetMs 4227261603 diff 12317421 timeout 5200
2026-03-18T06:51:45 NVRM: GPU0 _kgspRpcRecvPoll: GSP RM heartbeat timed out
Bug Incidence
Always (every boot, every power transition)
nvidia-bug-report.log.gz
Will upload in a follow-up comment.
More Info
- Related: Reloading nvidia_drm breaks power management if nvidia-powerd is stopped #1059 has identical GSP heartbeat errors on the same driver version (595.45.04) but triggered by module reload + nvidia-powerd restart on an RTX 4060. This issue occurs on a clean boot without any module reloading.
- Related: DP-MST monitor stays black after brief signal drop — DRM connector enabled:disabled / dpms:Off while compositor reports active #1055 (my earlier report on DP-MST monitor recovery failures on the same system) — the broken GSP state may degrade the driver's ability to handle DP hotplug recovery.
nvidia-smioutput during the issue:
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 595.45.04 Driver Version: 595.45.04 CUDA Version: 13.2 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA RTX PRO 1000 Blac... On | 00000000:01:00.0 On | N/A |
| N/A 42C P8 6W / 35W | 101MiB / 8151MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+