[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20070801234059.GA29224@skynet.ie>
Date: Thu, 2 Aug 2007 00:40:59 +0100
From: mel@...net.ie (Mel Gorman)
To: Torsten Kaiser <just.for.lkml@...glemail.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, Valdis.Kletnieks@...edu,
linux-kernel@...r.kernel.org, apw@...dowen.org
Subject: Re: 2.6.23-rc1-mm2
On (01/08/07 22:52), Torsten Kaiser didst pronounce:
> On 8/1/07, Andrew Morton <akpm@...ux-foundation.org> wrote:
> > On Wed, 01 Aug 2007 16:30:08 -0400
> > Valdis.Kletnieks@...edu wrote:
> >
> > > As an aside, it looks like bits&pieces of dynticks-for-x86_64 are in there.
> > > In particular, x86_64-enable-high-resolution-timers-and-dynticks.patch is in
> > > there, adding a menu that depends on GENERIC_CLOCKEVENTS, but then nothing
> > > in the x86_64 tree actually *sets* it. There's a few other dynticks-related
> > > prep patches in there as well. Does this mean it's back to "coming soon to
> > > a CPU near you" status? :)
> >
> > I've lost the plot on that stuff: I'm just leaving things as-is for now,
> > wait for Thomas to return from vacation so we can have another run at it.
>
> For what its worth: 2.6.22-rc6-mm1 with NO_HZ works for me on an AMD
> SMP system without trouble.
>
> Next try with 2.6.23-rc1-mm2 and SPARSEMEM:
> Probably the same exception, but this time with Call Trace:
> [ 0.000000] Bootmem setup node 0 0000000000000000-0000000080000000
> [ 0.000000] Bootmem setup node 1 0000000080000000-0000000120000000
> [ 0.000000] Zone PFN ranges:
> [ 0.000000] DMA 0 -> 4096
> [ 0.000000] DMA32 4096 -> 1048576
> [ 0.000000] Normal 1048576 -> 1179648
> [ 0.000000] Movable zone start PFN for each node
> [ 0.000000] early_node_map[4] active PFN ranges
> [ 0.000000] 0: 0 -> 159
> [ 0.000000] 0: 256 -> 524288
> [ 0.000000] 1: 524288 -> 917488
> [ 0.000000] 1: 1048576 -> 1179648
> PANIC: early exception rip ffffffff807cddb5 error 2 cr2 ffffe20003000010
> [ 0.000000]
> [ 0.000000] Call Trace:
> [ 0.000000] [<ffffffff807cddb5>] memmap_init_zone+0xb5/0x130
> [ 0.000000] [<ffffffff807ce874>] init_currently_empty_zone+0x84/0x110
> [ 0.000000] [<ffffffff807cec93>] free_area_init_node+0x393/0x3e0
> [ 0.000000] [<ffffffff807cefea>] free_area_init_nodes+0x2da/0x320
> [ 0.000000] [<ffffffff807c9c97>] paging_init+0x87/0x90
> [ 0.000000] [<ffffffff807c0f85>] setup_arch+0x355/0x470
> [ 0.000000] [<ffffffff807bc967>] start_kernel+0x57/0x330
> [ 0.000000] [<ffffffff807bc12d>] _sinittext+0x12d/0x140
> [ 0.000000]
> [ 0.000000] RIP memmap_init_zone+0xb5/0x130
>
> (gdb) list *0xffffffff807cddb5
> 0xffffffff807cddb5 is in memmap_init_zone (include/linux/list.h:32).
> 27 #define LIST_HEAD(name) \
> 28 struct list_head name = LIST_HEAD_INIT(name)
> 29
> 30 static inline void INIT_LIST_HEAD(struct list_head *list)
> 31 {
> 32 list->next = list;
> 33 list->prev = list;
> 34 }
> 35
> 36 /*
>
> I will test more tomorrow...
Well.... That doesn't make a whole pile of sense unless the memory map
is not present. Looking at your boot log, we see this gem
> [ 0.000000] 1: 524288 -> 917488
> [ 0.000000] 1: 1048576 -> 1179648
Node 1 spans a region with a nice little hole in the middle of DMA32. In our
test machines, we wouldn't see a hole like this, at least that I can recall
so it would appear to work on some machines. On SPARSEMEM, sparse_init()
is responsible for allocating memmap for each section. In 2.6.22-rc6-mm1,
it allocated the memory if the section was *valid*. In 2.6.23-rc1-mm1,
it allocates the memory if the section is *present* due to the patch
sparsemem-record-when-a-section-has-a-valid-mem_map.patch[1]. Much later in
the init process, memmap is initialised based on spanned memory, not present
memory so initialisation will init memmap that resides in holes if a zone
spans that area in a node which is the case on this machine. I think this
is why it kablamos - it's inits memmap that wasn't allocated because it's
not present and the suprise is that it doesn't blow up sooner. Please try
the patch below Torsten, thanks.
[1] yeah, I acked this patch and I had read through it. My bad if the
patch below does fix the problem
diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.23-rc1-mm2-clean/mm/sparse.c linux-2.6.23-rc1-mm2-present_revert/mm/sparse.c
--- linux-2.6.23-rc1-mm2-clean/mm/sparse.c 2007-08-01 10:09:39.000000000 +0100
+++ linux-2.6.23-rc1-mm2-present_revert/mm/sparse.c 2007-08-02 00:27:00.000000000 +0100
@@ -483,7 +483,7 @@ void __init sparse_init(void)
unsigned long *usemap;
for (pnum = 0; pnum < NR_MEM_SECTIONS; pnum++) {
- if (!present_section_nr(pnum))
+ if (!valid_section_nr(pnum))
continue;
map = sparse_early_mem_map_alloc(pnum);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists