Skip to content

GPU hang after continuous H264 transcoding for several days #46

@xrayzh

Description

@xrayzh

test env:
Platfrom: APL
OS: Ubuntu 16.04
Kernel version: 4.15.0-36-generic
ffmpeg: qsv-3.4.1.0-1-g7707fb6

command:
ffmpeg -hwaccel qsv -c:v h264_qsv -r 15
-rtsp_transport tcp -i [INPUT STREAM URL]
-vf vpp_qsv=w=1280:h=720:framerate=15
-c:v h264_qsv -g 30 -b:v 500000 -an
-map 0:v -f flv [RTMP SERVER URL]

The command above successfully runs continuously for usually several days until a GPU hang failure happens (see below).

We've noticed this behavior on at least two different machines.

Please advise on how to troubleshoot this issue.

error msg:

ffmpeg log:
[h264_qsv @ 0x2fcc380] Error during QSV decoding.: device failed (-17)
Error while decoding stream #0:0: Input/output error

kernel log:
Sep 13 04:54:58 box-M-I kernel: [drm] GPU HANG: ecode 9:0:0x85dffffb, in ffmpeg [13402], reason: Hang on rcs0, action: reset
Sep 13 04:54:58 box-M-I kernel: i915 0000:00:02.0: Resetting rcs0 after gpu hang
Sep 13 04:55:06 box-M-I kernel: i915 0000:00:02.0: Resetting rcs0 after gpu hang
Sep 13 04:55:06 box-M-I kernel: i915 0000:00:02.0: Resetting chip after gpu hang
Sep 13 04:55:06 box-M-I kernel: [drm:i915_reset [i915]] ERROR GPU recovery failed

workaround:
Killing the ffmpeg process and re-run the command doesn't fix the problem.
It requires a "sudo reboot" to recover from the failure.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions