[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <465FEBD3.7020801@shadowen.org>
Date: Fri, 01 Jun 2007 10:50:11 +0100
From: Andy Whitcroft <apw@...dowen.org>
To: "H. Peter Anvin" <hpa@...or.com>
CC: Andy Whitcroft <apw@...dowen.org>, Mel Gorman <mel@....ul.ie>,
Andrew Morton <akpm@...ux-foundation.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Steve Fox <drfickle@...ibm.com>
Subject: Re: 2.6.22-rc1-mm1
Andy Whitcroft wrote:
> Mel Gorman wrote:
>> On Wed, 16 May 2007, H. Peter Anvin wrote:
>>
>>> Correction, does *this patch* do it for you?
>>>
>> With these two patches in combination, previously failing machines
>> elm3b6 (x86_64 on test.kernel.org) and a modern x86 built a kernel and
>> booted correctly.
>>
>> elm3b132 and elm3b132 (x86 numaq on test.kernel.org) built with these
>> patches but silently fail on boot with no output via earlyprintk.
>> According to test.kernel.org, this failure occurs with git-newsetup
>> reverted so it is a separate problem.
>
> Ok, I've been following up on this failure on elm3b132/3. I moved
> forward to v2.6.22-rc2-mm1 and that also fails. I ran a bisection on
> the git-newsetup patch in as in -mm and basically it came down to the
> first patch, ie. any and all of this tree stops the boot.
>
> I just tried reproducing git-newsetup boot failures with the latest
> version of your tree (369f16fdd423d79640c4390915e6ab71189cb497) and that
> also fails.
>
> Fails in this context is hard boot failure after loading the kernel and
> before anything is printed. I also added a printf to the top of main()
> in boot/main.c and it doesn't come out, not that I really know if that
> means it got there or not.
>
> Any suggestions how to debug this puppy?
Thanks to Peter for all his encouragement off list.
I cannot claim to have sorted this one out, I do however understand why
my experiences and Mels did not seem consistent. Basically I am getting
inconsistent results with different machines.
I started my debug on a machine where 2.6.22-rc2 which worked and
2.6.22-rc2+newsetup which did not. I debugged the latter and managed to
prove that it was in fact getting all the way to the kernel
decompressor, and then crashing hard. The gzip image in memory was
intact and yet it did not decrypt correctly, the first about 60% was
intact, the remainder was damaged.
Suspecting that this was an "uncompress in place" overlap problem I
moved the compressed kernel way up out of the way and this then booted
successfully. Experimenting I was able to get it to boot by increasing
the overlap 'gap' from 32KB's to 64KB's. I was able to use the same
patch to boot 2.6.22-rc2-mm1 on the same problems machines. However,
this same overlap change did not fix another similar machine (the one in
the TKO grid).
I think that my debugging says that newsetup got the compressed kernel
and decompressor into memory ok and execution passed to it normally.
But I cannot figure out where the corruption is coming from. I tried
annotating the gzip decompressor to see if the input and output buffers
were overlapping at any time and that debug said no (unsure how reliable
that is). And yet at some point the output image is munched up.
One last piece of information. The decompressor also always seems to
get to the end of the input stream in exactly the right place without
reporting any kind of error, that is with exactly 8 bytes left over for
the length and crc checks. Which given the context sensitive nature of
the algorithm tends to imply the input stream was ok for the whole
duration of the decompress. Yet the output stream is badly broken.
Anyone got any wacky suggestions ...
-apw
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists