[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.2.00.0907220618270.2606@troy-laptop>
Date: Wed, 22 Jul 2009 07:16:30 +0100 (BST)
From: Troy Moure <twmoure@...pr.net>
To: Linus Torvalds <torvalds@...ux-foundation.org>
cc: Troy Moure <twmoure@...pr.net>, Krzysztof Oledzki <olel@....pl>,
Greg KH <gregkh@...e.de>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Andrew Morton <akpm@...ux-foundation.org>, stable@...nel.org,
lwn@....net, Ian Lance Taylor <iant@...gle.com>
Subject: Re: Linux 2.6.27.27
On Tue, 21 Jul 2009, Linus Torvalds wrote:
> > Just out of curiosity, how did you find it? Now that I know where to look,
> > it's very obvious in the assembler diffs, but I didn't notice it until you
> > pointed it out just because there is so _much_ of the diffs...
>
> Ahh. I think I see how you found it. Looking at the diffs, there's only a
> few places where the number of instructions changed by a big fraction. And
> there's only _one_ place that has a factor-of-three difference (26 lines
> in the correct cases, and 7 lines in the wrong one). Clever.
Hmm..that's interesting. But no, I wasn't that clever.
I actually just started poking around the radeonfb code, since you
mentioned it looked like that might be where the issue was. The last
message printed in the hung kernel is "Monitor 2 type no found" - printed
from radeon_probe_screens(). And the first message after that in the
non-hung kernel is "Console: switching to colour frame buffer device",
which I guessed was printed from register_framebuffer() (since that calls
notifiers).
So I started looking in radeon_fb_register() between the call to
radeon_probe_screens() and the call to register_framebuffer(), and tracing
through the calls it made. I ignored pci_, sysfs_, etc. calls, thinking
the driver code was more likely to have a device probing loop or something
odd like that that could be miscompiled.
For any functions that had a loop or anything strange-looking, I checked
the assembler diffs. And after a little while (a half-hour or so, I
think), I found edid_checksum(). Just the name made me think it was a
likely culprit, even before I looked at the diff.
Obviously I got a bit lucky that problem was actually basically where I
started looking for it. But I figured even if I didn't find it, I'd learn
something about the radeonfb code. And who would pass up an opportunity to
learn about that?
Troy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists