linux-kernel - Re: [PATCH 3.0-rc3] i915: Fix gen6 (SNB) GPU stalling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Wed, 15 Jun 2011 11:24:38 +0800
From:	Daniel J Blueman <daniel.blueman@...il.com>
To:	Eric Anholt <eric@...olt.net>
Cc:	Keith Packard <keithp@...thp.com>,
	Dave Airlie <airlied@...hat.com>,
	Chris Wilson <chris@...is-wilson.co.uk>,
	intel-gfx@...ts.freedesktop.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 3.0-rc3] i915: Fix gen6 (SNB) GPU stalling

On 15 June 2011 10:06, Eric Anholt <eric@...olt.net> wrote:
> On Wed, 15 Jun 2011 00:51:47 +0800, Daniel J Blueman <daniel.blueman@...il.com> wrote:
>> On 14 June 2011 13:23, Eric Anholt <eric@...olt.net> wrote:
>> > On Tue, 14 Jun 2011 12:18:36 +0800, Daniel J Blueman <daniel.blueman@...il.com> wrote:
>> >> Hi Eric,
>> >>
>> >> The frequent ~1.5s pauses I hit with SNB hardware in the gnome3 UI (eg
>> >> whenever you hit the top-left of the screen to show all windows) are
>> >> nicely addressed by your recent wake patch [1] (ported to -rc3). Thus
>> >> I see no 'missed IRQ' kernel messages.
>> >>
>> >> As this addresses a significant usability regression, are you happy to
>> >> add it to the 3.0-rc queue? I think it has very good value in -stable
>> >> also (assuming correctness). What do you think?
>> >
>> > This one had significant performance impacts, and later hacks in this
>> > series worked around the problem to approximately the same level of
>> > success with less impact, and we don't actually have a justification of
>> > why any of them work.  We were still hoping to come up with some clue,
>> > and haven't yet.
>>
>> True; that is quite heavy handed delay looping.
>>
>> It's a pity the usual Intel font didn't make it to the programmer's
>> reference manuals. Anyway, unmasking the blitter user interrupt in the hardware
>> status mask register addresses the root cause. Out of reset it's FFFFFFFFh,
>> so we don't need to read it here.
>>
>> It would be good to get this into -rc4. -stable probably needs some additional
>> tweaks.
>>
>> Signed-off-by: Daniel J Blueman <daniel.blueman@...il.com>

> So you're saying that our interrupts at the top-level IMR are triggered
> by the write to the status page of the lower-level ring?  That's
> surprising to me.  Or do you think that this write is just happening to
> trigger serialization so the interrupt comes after the DWORD write of
> the seqno by the ring?  (hw folks just recently indicated that our
> particular code is not expected to serialize the interrupt after the
> seqno store, unless we had an MI_FLUSH_DWORD in between)

When ISR bits not masked by the hardware status mask register change,
a write is generated with the ISR contents to the status page, so I
believe that the blitter command streamer wasn't generating an
interrupt when an MI_USER_INTERRUPT command was issued. The interrupt
handler would kick in for other interrupts, read all the IIRs and
notice the bit set and wake the ring interrupt queue anyway.

I guess we could test this by observing that the BLT user interrupt
IIR bit is always not set on it's own (ie another interrupt woke us)
in the interrupt handler.

Thanks,
  Daniel

> This patch has now passed 7000 iterations of the testcase that had a
> ~.5% failure rate before.
>
> Tested-by: Eric Anholt <eric@...olt.net>
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/