lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Thu, 23 Mar 2017 16:23:49 -0500
From:   Larry Finger <Larry.Finger@...inger.net>
To:     Chris Wilson <chris@...is-wilson.co.uk>,
        LKML <linux-kernel@...r.kernel.org>,
        Tvrtko Ursulin <tvrtko.ursulin@...el.com>,
        intel-gfx@...ts.freedesktop.org,
        Jani Nikula <jani.nikula@...ux.intel.com>,
        Daniel Vetter <daniel.vetter@...el.com>,
        Thorsten Leemhuis <regressions@...mhuis.info>
Subject: Re: Regression in i915 for 4.11-rc1 - bisected to commit 69df05e11ab8

On 03/23/2017 03:44 PM, Chris Wilson wrote:
> On Thu, Mar 23, 2017 at 01:19:43PM -0500, Larry Finger wrote:
>> Since kernel 4.11-rc1, my desktop (Plasma5/KDE) has encountered
>> intermittent hangs with the following information in the logs:
>>
>> linux-4v1g.suse kernel: [drm] GPU HANG: ecode 7:0:0xf3cffffe, in
>> plasmashell [1283], reason: Hang on render ring, action: reset
>> linux-4v1g.suse kernel: [drm] GPU hangs can indicate a bug anywhere
>> in the entire gfx stack, including userspace.
>> linux-4v1g.suse kernel: [drm] Please file a _new_ bug report on
>> bugs.freedesktop.org against DRI -> DRM/Intel
>> linux-4v1g.suse kernel: [drm] drm/i915 developers can then reassign
>> to the right component if it's not a kernel issue.
>> linux-4v1g.suse kernel: [drm] The gpu crash dump is required to
>> analyze gpu hangs, so please always attach it.
>> linux-4v1g.suse kernel: [drm] GPU crash dump saved to /sys/class/drm/card0/error
>> linux-4v1g.suse kernel: drm/i915: Resetting chip after gpu hang
>>
>> This problem was added to
>> https://bugs.freedesktop.org/show_bug.cgi?id=99380, but it probably
>> is a different bug, as the OP in that report has problems with
>> kernel 4.10.x, whereas my problem did not appear until 4.11.
>
> Close. Actually that patch touches code you are not using (oa-perf and
> gvt), the real culprit was e8a9c58fcd9a ("drm/i915: Unify active context
> tracking between legacy/execlists/guc").
>
> The fix
>
> commit 5d4bac5503fcc67dd7999571e243cee49371aef7
> Author: Chris Wilson <chris@...is-wilson.co.uk>
> Date:   Wed Mar 22 20:59:30 2017 +0000
>
>     drm/i915: Restore marking context objects as dirty on pinning
>
>     Commit e8a9c58fcd9a ("drm/i915: Unify active context tracking between
>     legacy/execlists/guc") converted the legacy intel_ringbuffer submission
>     to the same context pinning mechanism as execlists - that is to pin the
>     context until the subsequent request is retired. Previously it used the
>     vma retirement of the context object to keep itself pinned until the
>     next request (after i915_vma_move_to_active()). In the conversion, I
>     missed that the vma retirement was also responsible for marking the
>     object as dirty. Mark the context object as dirty when pinning
>     (equivalent to execlists) which ensures that if the context is swapped
>     out due to mempressure or suspend/hibernation, when it is loaded back in
>     it does so with the previous state (and not all zero).
>
>     Fixes: e8a9c58fcd9a ("drm/i915: Unify active context tracking between legacy/execlists/guc")
>     Reported-by: Dennis Gilmore <dennis@...il.us>
>     Reported-by: Mathieu Marquer <mathieu.marquer@...il.com>
>     Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99993
>     Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100181
>     Signed-off-by: Chris Wilson <chris@...is-wilson.co.uk>
>     Cc: Tvrtko Ursulin <tvrtko.ursulin@...el.com>
>     Cc: <drm-intel-fixes@...ts.freedesktop.org> # v4.11-rc1
>     Link: http://patchwork.freedesktop.org/patch/msgid/20170322205930.12762-1-chris@chris-wilson.co.uk
>     Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@...el.com>
>
> went in this morning and so will be upstreamed ~next week.
> -Chris

Thanks. With a bug that is difficult to trigger, bisection is difficult. I am 
surprised that the only step I got wrong was the last one. BTW, my reversion 
failed after 20 hours. I was ready to write again when I got your fix. Good timing.

If your patch does not fix my problem, I will let you know.

Larry


Powered by blists - more mailing lists