lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 28 Feb 2013 10:09:59 -0500
From:	Alex Deucher <alexdeucher@...il.com>
To:	Josh Boyer <jwboyer@...il.com>
Cc:	Dave Airlie <airlied@...ux.ie>,
	Alex Deucher <alexander.deucher@....com>,
	Jerome Glisse <jglisse@...hat.com>,
	torvalds@...ux-foundation.org, linux-kernel@...r.kernel.org,
	DRI mailing list <dri-devel@...ts.freedesktop.org>
Subject: Re: [git pull] drm merge for 3.9-rc1

On Thu, Feb 28, 2013 at 8:44 AM, Josh Boyer <jwboyer@...il.com> wrote:
> On Thu, Feb 28, 2013 at 8:38 AM, Alex Deucher <alexdeucher@...il.com> wrote:
>> On Wed, Feb 27, 2013 at 8:14 PM, Josh Boyer <jwboyer@...il.com> wrote:
>>> On Wed, Feb 27, 2013 at 7:01 PM, Josh Boyer <jwboyer@...il.com> wrote:
>>>> On Wed, Feb 27, 2013 at 3:20 PM, Josh Boyer <jwboyer@...il.com> wrote:
>>>>> On Wed, Feb 27, 2013 at 11:34 AM, Josh Boyer <jwboyer@...il.com> wrote:
>>>>>> On Mon, Feb 25, 2013 at 7:05 PM, Dave Airlie <airlied@...ux.ie> wrote:
>>>>>>> Alex Deucher (29):
>>>>>>>       drm/radeon: halt engines before disabling MC (6xx/7xx)
>>>>>>>       drm/radeon: halt engines before disabling MC (evergreen)
>>>>>>>       drm/radeon: halt engines before disabling MC (cayman/TN)
>>>>>>>       drm/radeon: halt engines before disabling MC (si)
>>>>>>>       drm/radeon: use the reset mask to determine if rings are hung
>>>>>>
>>>>>> Something in this series of commits is causing the GPU to hang on reboot
>>>>>> on my Dell XPS 8300 machine.  That has a:
>>>>>>
>>>>>> 01:00.0 VGA compatible controller: Advanced Micro Devices [AMD] nee
>>>>>> ATI Caicos [Radeon HD 6450]
>>>>>>
>>>>>> card in it.  After reboots, I get a screen that looks like this:
>>>>>>
>>>>>> http://t.co/tPnT6xQZUK
>>>>>>
>>>>>> I can hit it fairly consistently after a few reboots, so I tried doing a
>>>>>> git bisect on the radeon driver and it came down to:
>>>>>>
>>>>>> ca57802e521de54341efc8a56f70571f79ffac72 is the first bad commit
>>>>>
>>>>> So I don't think that's actually the cause of the problem.  Or at least
>>>>> not that alone.  I reverted it on top of Linus' latest tree and I still
>>>>> get the lockups.
>>>>
>>>> Actually, git bisect does seem to have gotten it correct.  Once I
>>>> actually tested the revert of just that on top of Linus' tree (commit
>>>> d895cb1af1), things seem to be working much better.  I've rebooted a
>>>> dozen times without a lockup.  The most I've seen it take on a kernel
>>>> with that commit included is 3 reboots, so that's definitely at least an
>>>> improvement.
>>>
>>> I give up.  GPU issues are not my thing.  2 reboots after I sent that it
>>> gave me pretty rainbow static again.  So it might have been an
>>> improvement, but revert it is not a solution.
>>>
>>> Looking at there rest of the commits, the whole GPU rework might be
>>> suspect, but I clearly have no clue.
>>
>> GPUs are tricky beasts :)
>
> Understatement ;).
>
>> ca57802e521de54341efc8a56f70571f79ffac72 mostly likely wasn't the
>> problem anyway since it only affects 6xx/7xx and your card is handled
>> by the evergreen code.  I'll put together some patches to help narrow
>> down the problem.
>
> Yeah, that's the biggest problem I have, not knowing which functions are
> actually being executed for this card.  It looks like a combination of
> stuff in evergreen.c and ni.c, but I have no idea.
>
> Patches would be great.  If nothing else, I'm really good at building
> kernels and rebooting by now.

Two possible fixes attached.  The first attempts a full reset of all
blocks if the MC (memory controller) is hung.  That may work better
than just resetting the MC.  The second just disables MC reset.  I'm
not sure we can reliably tell if it's busy due to display requests
hitting the MC periodically which would lead to needlessly resetting
it possibly leading to failures like you are seeing.

Alex

View attachment "0001-drm-radeon-XXX-try-a-full-reset-if-the-MC-is-busy.patch" of type "text/x-patch" (992 bytes)

View attachment "0001-drm-radeon-XXX-skip-MC-reset-as-it-s-probably-not-hu.patch" of type "text/x-patch" (1138 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ