linux-kernel - Re: [git pull] drm fixes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <21d7e9971003292349v36fc29f3i55d79ecc197df728@mail.gmail.com>
Date:	Tue, 30 Mar 2010 16:49:16 +1000
From:	Dave Airlie <airlied@...il.com>
To:	Michel Dänzer <michel@...nzer.net>
Cc:	Dave Airlie <airlied@...ux.ie>, torvalds@...ux-foundation.org,
	linux-kernel@...r.kernel.org, dri-devel@...ts.sf.net,
	Jerome Glisse <glisse@...edesktop.org>
Subject: Re: [git pull] drm fixes

2010/3/30 Michel Dänzer <michel@...nzer.net>:
> On Tue, 2010-03-30 at 05:34 +0100, Dave Airlie wrote:
>>
>> Original pull req below + reverts the fallback placement change which had
>> a side effect of causing more lockups on some AGP systems (this is a bug in
>> the AGP drivers that needs to be tracked down), [...]
>
> While I was able to work around the lockups by making the AGP driver
> never unbind a GTT entry, I think it's rather a radeon issue - how is
> the AGP driver supposed to know when it's safe to unbind an entry?

This issue has been a problem with AGP before, the Intel AGP docs claim
you should always use scratch pages on AGP, and never complete remove
bound entries. I've no idea why this is, as you'd expect AGP cards to
only generate
cycles to entries they've been asked to. There may be some memory controller
prefetching going on that could lead to prefetching into an unbound AGP page
and the resulting machine check that may cause I suppose.

We need to track this separately anyways and fix it for 2.6.35 hopefully, at
least we have a patch that can handle it.

> That change had lots of other issues anyway, thanks for reverting it.
>
>
>> [...] and I've merged Jerome's GPU recovery code, as I'd much rather
>> users had some of hope of recovering from their GPU locking up than a
>> dead box. It seems to work for quite a lot of people that have tested
>> it, and it won't make a GPU lockup problem worse.
>
> Unfortunately, that's not true in all cases. The change itself mentions
> that the new reset code is unreliable for R3xx generation GPUs, and
> indeed with my RV350 it now turns my box into a brick immediately on a
> GPU lockup most of the time whereas previously it was usually able to
> recover at least in some cases, e.g. falling back to PCI mode after
> trying to use a non-working AGP transfer mode.
>

Okay so it makes it worse, hopefully Jerome can track it down, or
else we can lock down the gpu reset to only trying on the r600s where
it definitely makes life a lot better for everyone.

Dave.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/