linux-kernel - Re: [PATCH 3.0-rc3] i915: Fix gen6 (SNB) GPU stalling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <877h8nc0gp.fsf@eliezer.anholt.net>
Date:	Tue, 14 Jun 2011 19:06:30 -0700
From:	Eric Anholt <eric@...olt.net>
To:	Daniel J Blueman <daniel.blueman@...il.com>
Cc:	Keith Packard <keithp@...thp.com>,
	Dave Airlie <airlied@...hat.com>,
	Chris Wilson <chris@...is-wilson.co.uk>,
	intel-gfx@...ts.freedesktop.org, linux-kernel@...r.kernel.org,
	Daniel J Blueman <daniel.blueman@...il.com>
Subject: Re: [PATCH 3.0-rc3] i915: Fix gen6 (SNB) GPU stalling

On Wed, 15 Jun 2011 00:51:47 +0800, Daniel J Blueman <daniel.blueman@...il.com> wrote:
> On 14 June 2011 13:23, Eric Anholt <eric@...olt.net> wrote:
> > On Tue, 14 Jun 2011 12:18:36 +0800, Daniel J Blueman <daniel.blueman@...il.com> wrote:
> >> Hi Eric,
> >>
> >> The frequent ~1.5s pauses I hit with SNB hardware in the gnome3 UI (eg
> >> whenever you hit the top-left of the screen to show all windows) are
> >> nicely addressed by your recent wake patch [1] (ported to -rc3). Thus
> >> I see no 'missed IRQ' kernel messages.
> >>
> >> As this addresses a significant usability regression, are you happy to
> >> add it to the 3.0-rc queue? I think it has very good value in -stable
> >> also (assuming correctness). What do you think?
> >
> > This one had significant performance impacts, and later hacks in this
> > series worked around the problem to approximately the same level of
> > success with less impact, and we don't actually have a justification of
> > why any of them work.  We were still hoping to come up with some clue,
> > and haven't yet.
> 
> True; that is quite heavy handed delay looping.
> 
> It's a pity the usual Intel font didn't make it to the programmer's
> reference manuals. Anyway, unmasking the blitter user interrupt in the hardware
> status mask register addresses the root cause. Out of reset it's FFFFFFFFh,
> so we don't need to read it here.
> 
> It would be good to get this into -rc4. -stable probably needs some additional
> tweaks.
> 
> Signed-off-by: Daniel J Blueman <daniel.blueman@...il.com>

So you're saying that our interrupts at the top-level IMR are triggered
by the write to the status page of the lower-level ring?  That's
surprising to me.  Or do you think that this write is just happening to
trigger serialization so the interrupt comes after the DWORD write of
the seqno by the ring?  (hw folks just recently indicated that our
particular code is not expected to serialize the interrupt after the
seqno store, unless we had an MI_FLUSH_DWORD in between)

This patch has now passed 7000 iterations of the testcase that had a
~.5% failure rate before.

Tested-by: Eric Anholt <eric@...olt.net>

Content of type "application/pgp-signature" skipped