[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADnq5_ORb_kdaehxR6y+P1S0eStiLEo7jFOYXhn4sMDw5b=bcw@mail.gmail.com>
Date: Tue, 18 Dec 2012 10:22:55 -0500
From: Alex Deucher <alexdeucher@...il.com>
To: Christian König <deathsimple@...afone.de>
Cc: Paul Bolle <pebolle@...cali.nl>,
Jerome Glisse <jglisse@...hat.com>,
linux-kernel@...r.kernel.org, dri-devel@...ts.freedesktop.org
Subject: Re: [PATCH] [RFC] drm/radeon: return 0 on successful gpu reset
On Tue, Dec 18, 2012 at 5:36 AM, Christian König
<deathsimple@...afone.de> wrote:
> On 17.12.2012 22:31, Paul Bolle wrote:
>>
>> On an (outdated) laptop the radeon driver (almost always) prints, during
>> the first resume of each session:
>> [drm] crtc 1 is connected to a TV
>>
>> This message is a bit puzzling as, as far as I know, no TV has ever
>> been connected to this laptop. Anyhow, before v3.5, if that happened the
>> radeon driver then printed an error during all following resumes:
>> [drm:radeon_cs_ioctl] *ERROR* Failed to parse relocation -35!
>>
>> (-35 is -EDEADLK.) But the resume would succeed and the driver seemed to
>> run without too much trouble. From v3.5 onwards things changed. If the
>> (puzzling) message about crtc 1 was printed on first resume the laptop
>> would simply hang on second resume. Only a manual power off would then
>> be possible. In that case nothing of interest would be found in the
>> (truncated) logs. And, most annoyingly, the hang would never happen if
>> the laptop was booted with, say, "console=ttyS0,115200n8" added to the
>> kernel command line.
>>
>> I bisected the hang to commit 6c6f478370eccfbfafbdc6fc55c0def03e58f124
>> ("drm/radeon: rework recursive gpu reset handling"), which was added in
>> the v3.5 release cycle. After discovering that and poking at the driver
>> it turned out that this hang is triggered by radeon_cs_handle_lockup()
>> returning -EAGAIN after successfully resetting the gpu. Simply returning
>> 0 makes the hang disappear (and makes the drm error reappear).
>>
>> Nothing in the code or the commit explanation clarifies why -EAGAIN
>> should be returned on successful gpu reset. So I suggest
>> radeon_cs_handle_lockup() simply returns what radeon_gpu_reset()
>> returns, eg 0 (on success) or a negative error code (on failure).
>>
>> Signed-off-by: Paul Bolle <pebolle@...cali.nl>
>> ---
>> 0) This exact patch is untested (but I run something comparable).
>>
>> 1) Sent as an RFC because I do not understand why this laptop (almost
>> always) prints the "crtc 1" message on first resume. Note that another
>> workaround for this hang is simply booting with "radeon.tv=0".
>
> Alex should probably take a look into this, since he probably is the one
> with the deepest knowledge of the display engine. My best guess is that it
> is just some error while probing for an attached TV and actually isn't so
> bad after all.
TV detection is always a bit flakey. I suspect some register or gpio
related to TV out doesn't get restored properly on resume which
results in a false positive. What asic is this?
>
>
>> 2) Also sent as an RFC because I have no idea whatsoever why returning
>> -EAGAIN will hang the machine. I guess it's returned to userland by
>> radeon_cs_ioctl(). What code uses that ioctl? And what does that code do
>> on -EAGAIN that hangs this laptop?
>
>
> EAGAIN just tells userspace to reissue the requested system call. When a
> system call is interrupted (either by a signal or in our case a GPU lockup)
> it aborts and EGAIN is returned to userspace, telling userspace that it
> should try again. So by just returning 0 userspace things that our system
> call was executed and doesn't try it again.
>
> So you just prevented the normal reissuing of the system call and so also
> prevented whatever this command submission should be doing in the first
> place.
>
>
>> 3) A third reason to send this as an RFC is that I also have no idea why
>> this hang doesn't happen when booting with "console=ttyS0,115200n8" or
>> even "console=tty0"! But I guess I'm now allowed to call this hang a
>> Heisenbug.
>
> "Heisenbug" ? LOL, I need to remember that.
>
> But anyway it is not so unusual seeing a bug like this, cause it is possible
> (but highly unlikely) that actually trying to print an error message can
> cause a lockup.
>
> You should definitely try Alex latest drm-fixes-3.8 branch
> (git://people.freedesktop.org/~agd5f/linux) since the possibility is quite
> high that we already have fixed that bug. If that doesn't helps then please
> open a bug report and leave me a note so that I can investigate further.
I don't have a 3.8 fixes branch up just yet, but will soon.
Alex
>
> Christian.
>
>
>
>>
>> drivers/gpu/drm/radeon/radeon_cs.c | 5 +----
>> 1 files changed, 1 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/radeon/radeon_cs.c
>> b/drivers/gpu/drm/radeon/radeon_cs.c
>> index 41672cc..a302c00 100644
>> --- a/drivers/gpu/drm/radeon/radeon_cs.c
>> +++ b/drivers/gpu/drm/radeon/radeon_cs.c
>> @@ -486,11 +486,8 @@ out:
>> static int radeon_cs_handle_lockup(struct radeon_device *rdev, int r)
>> {
>> - if (r == -EDEADLK) {
>> + if (r == -EDEADLK)
>> r = radeon_gpu_reset(rdev);
>> - if (!r)
>> - r = -EAGAIN;
>> - }
>> return r;
>> }
>>
>
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@...ts.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists