[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4538DACC.5050605@shadowen.org>
Date:	Fri, 20 Oct 2006 15:18:52 +0100
From:	Andy Whitcroft <apw@...dowen.org>
To:	Paul Mackerras <paulus@...ba.org>
CC:	Christoph Lameter <clameter@....com>,
	Anton Blanchard <anton@...ba.org>, akpm@...l.org,
	linuxppc-dev@...abs.org, linux-kernel@...r.kernel.org,
	Mel Gorman <mel@....ul.ie>, Mike Kravetz <kravetz@...ibm.com>
Subject: Re: kernel BUG in __cache_alloc_node at linux-2.6.git/mm/slab.c:3177!
Paul Mackerras wrote:
> Christoph Lameter writes:
> 
>> The page allocator must be running and able to serve pages from the boot 
>> node. This fails for some reason and the slab cannot bootstrap. The memory 
>> not available is the first guess. Could you trace the allocation in the 
>> page allocator (__alloc_pages) when the slab attempts to bootstrap and 
>> figure out why exactly the allocation fails?
> 
> What is happening is that all pages are getting their zone id field in
> their page->flags set to point to zone for node 1 by memmap_init_zone
> calling set_page_links (which does set_page_zone).  Thus, when those
> pages get freed by free_all_bootmem_node, they all end up in the zone
> for node 1.
> 
> memmap_init_zone is called (as memmap_init, since we don't have
> __HAVE_ARCH_MEMMAP_INIT defined) from init_currently_empty_zone, which
> is called from free_area_init_core.  Now the thing is that memmap_init
> and init_currently_empty_zone are called with the node's start PFN and
> size in pages, *including* holes.  On the partition I'm using we have
> these PFN ranges for the nodes:
> 
>     1:        0 ->    32768
>     0:    32768 ->   278528
>     1:   278528 ->   524288
> 
> So node 0's start PFN is 32768 and its size is 245760 pages, and so we
> correctly set pages 32786 to 278527 to be in the zone for node 0.
> Then for node 1, we have the start PFN is 0 and the size is 524288, so
> we then go through and set *all* pages of memory to be in the zone for
> node 1, including the pages which are actually on node 0.
> 
> That's why we can't allocate any pages on node 0, and the kmem cache
> bootstrapping blows up.
> 
> I don't know this code well enough to know what the correct fix is.
> Clearly memmap_init_zone should only be touching the pages that are
> actually present in the zone, but I don't know exactly what data
> structures it should be using to know what those pages are.
Mel Gorman and I have been poking at this from different ends.  Mel from
the context of this thread and myself trying to fix a machine which was
exhibiting on 32MB of ram in node 0 and the rest in node 1.
I remember that we used to have code to cope with this in the ppc64
architecture, indeed I remember reviewing it all that time ago.  Looking
at the current state of the tree it was removed in the two patches below
in mainline:
	"[PATCH] Remove SPAN_OTHER_NODES config definition"
	"[PATCH] mm: remove arch independent NODES_SPAN_OTHER_NODES"
These commits:
	f62859bb6871c5e4a8e591c60befc8caaf54db8c
	a94b3ab7eab4edcc9b2cb474b188f774c331adf7
I'll follow up to this email with the reversion patch we used in
testing.  It seems to sort this problem out at least, though now its
blam'ing in ibmveth, so am retesting with yet another patch.  This patch
reverts the two patches above and updates the commentry on the Kconfig
entry.
-apw
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Powered by blists - more mailing lists
 
