Skip to content

GPU hang after continuous H264 transcoding for several days #46

@xrayzh

Description

@xrayzh

test env:
Platfrom: APL
OS: Ubuntu 16.04
Kernel version: 4.15.0-36-generic
ffmpeg: qsv-3.4.1.0-1-g7707fb6

command:
ffmpeg -hwaccel qsv -c:v h264_qsv -r 15
-rtsp_transport tcp -i [INPUT STREAM URL]
-vf vpp_qsv=w=1280:h=720:framerate=15
-c:v h264_qsv -g 30 -b:v 500000 -an
-map 0:v -f flv [RTMP SERVER URL]

The command above successfully runs continuously for usually several days until a GPU hang failure happens (see below).

We've noticed this behavior on at least two different machines.

Please advise on how to troubleshoot this issue.

error msg:

ffmpeg log:
[h264_qsv @ 0x2fcc380] Error during QSV decoding.: device failed (-17)
Error while decoding stream #0:0: Input/output error

kernel log:
Sep 13 04:54:58 box-M-I kernel: [drm] GPU HANG: ecode 9:0:0x85dffffb, in ffmpeg [13402], reason: Hang on rcs0, action: reset
Sep 13 04:54:58 box-M-I kernel: i915 0000:00:02.0: Resetting rcs0 after gpu hang
Sep 13 04:55:06 box-M-I kernel: i915 0000:00:02.0: Resetting rcs0 after gpu hang
Sep 13 04:55:06 box-M-I kernel: i915 0000:00:02.0: Resetting chip after gpu hang
Sep 13 04:55:06 box-M-I kernel: [drm:i915_reset [i915]] ERROR GPU recovery failed

workaround:
Killing the ffmpeg process and re-run the command doesn't fix the problem.
It requires a "sudo reboot" to recover from the failure.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions