[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170112092124.GN16278@nuc-i3427.alporthouse.com>
Date: Thu, 12 Jan 2017 09:21:24 +0000
From: Chris Wilson <chris@...is-wilson.co.uk>
To: Juergen Gross <jgross@...e.com>
Cc: Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
dri-devel@...ts.freedesktop.org,
intel-gfx <intel-gfx@...ts.freedesktop.org>, airlied@...ux.ie,
daniel.vetter@...el.com
Subject: Re: [Intel-gfx] GPU hang with kernel 4.10rc3
On Thu, Jan 12, 2017 at 07:03:25AM +0100, Juergen Gross wrote:
> On 11/01/17 18:08, Chris Wilson wrote:
> > On Wed, Jan 11, 2017 at 05:33:34PM +0100, Juergen Gross wrote:
> >> With kernel 4.10rc3 running as Xen dm0 I get at each boot:
> >>
> >> [ 49.213697] [drm] GPU HANG: ecode 7:0:0x3d1d3d3d, in gnome-shell
> >> [1431], reason: Hang on render ring, action: reset
> >> [ 49.213699] [drm] GPU hangs can indicate a bug anywhere in the entire
> >> gfx stack, including userspace.
> >> [ 49.213700] [drm] Please file a _new_ bug report on
> >> bugs.freedesktop.org against DRI -> DRM/Intel
> >> [ 49.213700] [drm] drm/i915 developers can then reassign to the right
> >> component if it's not a kernel issue.
> >> [ 49.213700] [drm] The gpu crash dump is required to analyze gpu
> >> hangs, so please always attach it.
> >> [ 49.213701] [drm] GPU crash dump saved to /sys/class/drm/card0/error
> >> [ 49.213755] drm/i915: Resetting chip after gpu hang
> >> [ 60.213769] drm/i915: Resetting chip after gpu hang
> >> [ 71.189737] drm/i915: Resetting chip after gpu hang
> >> [ 82.165747] drm/i915: Resetting chip after gpu hang
> >> [ 93.205727] drm/i915: Resetting chip after gpu hang
> >>
> >> The dump is attached.
> >
> > That's a nasty one. The first couple of pages of the batchbuffer appear
> > to be overwritten. (Full of 0xc2c2c2c2, i.e. probably pixel data.) That
> > may be a concurrent write by either the GPU or CPU, or we may have
> > incorrected mapped a set of pages. That it doesn't recovered suggests
> > that the corruption occurs frequently, probably on every request/batch.
>
> I hoped someone would have an idea already.
Sorry, first report of something like this in a long time (that I can
remember at least). And the problem is that it can be anything from a
coherency to a concurrency issue, so no one patch springs to mind.
Thankfully it appears to be kernel related.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
Powered by blists - more mailing lists