lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <558064cb-f489-a743-79cb-c88fd06f17aa@lwfinger.net>
Date:   Thu, 23 Mar 2017 13:19:43 -0500
From:   Larry Finger <Larry.Finger@...inger.net>
To:     LKML <linux-kernel@...r.kernel.org>,
        Chris Wilson <chris@...is-wilson.co.uk>,
        Tvrtko Ursulin <tvrtko.ursulin@...el.com>,
        intel-gfx@...ts.freedesktop.org,
        Jani Nikula <jani.nikula@...ux.intel.com>,
        Daniel Vetter <daniel.vetter@...el.com>
Cc:     Thorsten Leemhuis <regressions@...mhuis.info>
Subject: Regression in i915 for 4.11-rc1 - bisected to commit 69df05e11ab8

Since kernel 4.11-rc1, my desktop (Plasma5/KDE) has encountered intermittent 
hangs with the following information in the logs:

linux-4v1g.suse kernel: [drm] GPU HANG: ecode 7:0:0xf3cffffe, in plasmashell 
[1283], reason: Hang on render ring, action: reset
linux-4v1g.suse kernel: [drm] GPU hangs can indicate a bug anywhere in the 
entire gfx stack, including userspace.
linux-4v1g.suse kernel: [drm] Please file a _new_ bug report on 
bugs.freedesktop.org against DRI -> DRM/Intel
linux-4v1g.suse kernel: [drm] drm/i915 developers can then reassign to the right 
component if it's not a kernel issue.
linux-4v1g.suse kernel: [drm] The gpu crash dump is required to analyze gpu 
hangs, so please always attach it.
linux-4v1g.suse kernel: [drm] GPU crash dump saved to /sys/class/drm/card0/error
linux-4v1g.suse kernel: drm/i915: Resetting chip after gpu hang

This problem was added to https://bugs.freedesktop.org/show_bug.cgi?id=99380, 
but it probably is a different bug, as the OP in that report has problems with 
kernel 4.10.x, whereas my problem did not appear until 4.11.

The problem was bisected to commit 69df05e11ab8 ("drm/i915: Simplify releasing 
context reference"). The accuracy of the bisection was tested by reverting that 
patch in kernel 4.11-rc3. With that change, my kernel has now run for over 17 
hours with no problem. Before the reversion, the longest any affected kernel 
would run was ~3 hours until a gpu hang was detected.

I admit that I do not understand this driver, but my guess is that this commit 
introduced a race condition in the context put.

Thanks,
Larry

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ