[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.2.00.0907220618270.2606@troy-laptop>
Date:	Wed, 22 Jul 2009 07:16:30 +0100 (BST)
From:	Troy Moure <twmoure@...pr.net>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
cc:	Troy Moure <twmoure@...pr.net>, Krzysztof Oledzki <olel@....pl>,
	Greg KH <gregkh@...e.de>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Andrew Morton <akpm@...ux-foundation.org>, stable@...nel.org,
	lwn@....net, Ian Lance Taylor <iant@...gle.com>
Subject: Re: Linux 2.6.27.27
On Tue, 21 Jul 2009, Linus Torvalds wrote:
> > Just out of curiosity, how did you find it? Now that I know where to look, 
> > it's very obvious in the assembler diffs, but I didn't notice it until you 
> > pointed it out just because there is so _much_ of the diffs...
> 
> Ahh. I think I see how you found it. Looking at the diffs, there's only a 
> few places where the number of instructions changed by a big fraction. And 
> there's only _one_ place that has a factor-of-three difference (26 lines 
> in the correct cases, and 7 lines in the wrong one). Clever.
Hmm..that's interesting.  But no, I wasn't that clever.
I actually just started poking around the radeonfb code, since you 
mentioned it looked like that might be where the issue was.  The last 
message printed in the hung kernel is "Monitor 2 type no found" - printed 
from radeon_probe_screens().  And the first message after that in the 
non-hung kernel is "Console: switching to colour frame buffer device", 
which I guessed was printed from register_framebuffer() (since that calls 
notifiers).
So I started looking in radeon_fb_register() between the call to 
radeon_probe_screens() and the call to register_framebuffer(), and tracing 
through the calls it made.  I ignored pci_, sysfs_, etc. calls, thinking 
the driver code was more likely to have a device probing loop or something 
odd like that that could be miscompiled.
For any functions that had a loop or anything strange-looking, I checked 
the assembler diffs.  And after a little while (a half-hour or so, I 
think), I found edid_checksum().  Just the name made me think it was a 
likely culprit, even before I looked at the diff. 
Obviously I got a bit lucky that problem was actually basically where I 
started looking for it.  But I figured even if I didn't find it, I'd learn 
something about the radeonfb code. And who would pass up an opportunity to 
learn about that?
	Troy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Powered by blists - more mailing lists
 
