[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20061006171105.GC9881@skynet.ie>
Date: Fri, 6 Oct 2006 18:11:05 +0100
From: mel@...net.ie (Mel Gorman)
To: Vivek Goyal <vgoyal@...ibm.com>
Cc: Steve Fox <drfickle@...ibm.com>, Andi Kleen <ak@...e.de>,
Badari Pulavarty <pbadari@...ibm.com>,
Martin Bligh <mbligh@...igh.org>,
Andrew Morton <akpm@...l.org>,
lkml <linux-kernel@...r.kernel.org>, netdev@...r.kernel.org,
kmannth@...ibm.com, Andy Whitcroft <apw@...dowen.org>
Subject: Re: 2.6.18-mm2 boot failure on x86-64
On (06/10/06 11:36), Vivek Goyal didst pronounce:
> On Fri, Oct 06, 2006 at 03:33:12PM +0100, Mel Gorman wrote:
> > > Linux version 2.6.18-git22 (root@...3b239) (gcc version 4.1.0 (SUSE Linux)) #2 SMP Thu Oct 5 19:05:36 PDT 2006
> > > Command line: root=/dev/sda1 vga=791 ip=9.47.67.239:9.47.67.50:9.47.67.1:255.255.255.0 resume=/dev/sdb1 showopts earlyprintk=serial,ttyS0,57600 console=tty0 console=ttyS0,57600 autobench_args: root=/dev/sda1 ABAT:1160100417
> > > BIOS-provided physical RAM map:
> > > BIOS-e820: 0000000000000000 - 000000000009ac00 (usable)
> > > BIOS-e820: 000000000009ac00 - 00000000000a0000 (reserved)
> > > BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
> > > BIOS-e820: 0000000000100000 - 00000000bff764c0 (usable)
> > > BIOS-e820: 00000000bff764c0 - 00000000bff98880 (ACPI data)
> > > BIOS-e820: 00000000bff98880 - 00000000c0000000 (reserved)
> > > BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
> > > BIOS-e820: 0000000100000000 - 0000000c00000000 (usable)
> >
> > I continued what Steve was doing this morning to see could this be
> > pinned down. After placing 'CHECK;' in a few places as suggested by
> > Andi's check, the problem code was identified as that following in
> > mm/bootmem.c#init_bootmem_core()
> >
> > mapsize = get_mapsize(bdata);
> > memset(bdata->node_bootmem_map, 0xff, mapsize);
> >
> > That explains the value in the array at least. A few more printfs around
> > this point printed out the following in the boot log
> >
> > init_bootmem_core(0, 1909, 0, 12582912)
> > init_bootmem_core: Calling memset(0xFFFF810000775000, 1572864)
> > AAGH: afinfo corrupted at mm/bootmem.c:121
> >
> > where;
> >
> > 1909 == mapstart
> > 0 == start
> > 12582912 == end
> > 1572864 == mapsize
> >
> > mapstart, start and end being the parameters being passed to
> > init_bootmem_core(). This means we are calling memset for the physical
> > range 0x775000 -> 0x8F5000 which is in a usable range according to the
> > BIOS-e820 map it appears.
> >
>
> Hi Mel,
>
Hi.
> Where is bss placed in physical memory? I guess bss_start and bss_stop
> from System.map will tell us. That will confirm that above memset step is
> stomping over bss. Then we have to just find that somewhere probably
> we allocated wrong physical memory area for bootmem allocator map.
>
BSS is at 0x643000 -> 0x777BC4
init_bootmem wipes from 0x777000 -> 0x8F7000
So the BSS bytes from 0x777000 ->0x777BC4 (which looks very suspiciously
pile a page alignment of addr & PAGE_MASK) gets set to 0xFF. One possible
fix is below. It adds a check in bad_addr() to see if the BSS section is
about to be used for bootmap. It Seems To Work For Me (tm) and illustrates
the source of the problem even if it's not the 100% correct fix.
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.18-git22-clean/arch/x86_64/kernel/e820.c linux-2.6.18-git22-bss_relocate_fix/arch/x86_64/kernel/e820.c
--- linux-2.6.18-git22-clean/arch/x86_64/kernel/e820.c 2006-10-05 20:42:07.000000000 +0100
+++ linux-2.6.18-git22-bss_relocate_fix/arch/x86_64/kernel/e820.c 2006-10-06 17:39:51.000000000 +0100
@@ -51,6 +51,7 @@ extern struct resource code_resource, da
static inline int bad_addr(unsigned long *addrp, unsigned long size)
{
unsigned long addr = *addrp, last = addr + size;
+ unsigned long bss_start, bss_end;
/* various gunk below that needed for SMP startup */
if (addr < 0x8000) {
@@ -77,6 +78,14 @@ static inline int bad_addr(unsigned long
*addrp = __pa_symbol(&_end);
return 1;
}
+
+ /* bss section */
+ bss_start = __pa_symbol(&__bss_start);
+ bss_end = PAGE_ALIGN(__pa_symbol(&__bss_stop));
+ if (addr >= bss_start && addr < bss_end) {
+ *addrp = bss_end;
+ return 1;
+ }
if (last >= ebda_addr && addr < ebda_addr + ebda_size) {
*addrp = ebda_addr + ebda_size;
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists