lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4538F2A2.5040305@shadowen.org>
Date:	Fri, 20 Oct 2006 17:00:34 +0100
From:	Andy Whitcroft <apw@...dowen.org>
To:	Paul Mackerras <paulus@...ba.org>
CC:	Andy Whitcroft <apw@...dowen.org>,
	Christoph Lameter <clameter@....com>,
	Anton Blanchard <anton@...ba.org>, akpm@...l.org,
	linuxppc-dev@...abs.org, linux-kernel@...r.kernel.org,
	Mel Gorman <mel@....ul.ie>, Mike Kravetz <kravetz@...ibm.com>,
	will_schmidt@...t.ibm.com
Subject: Re: kernel BUG in __cache_alloc_node at linux-2.6.git/mm/slab.c:3177!

Andy Whitcroft wrote:
> Paul Mackerras wrote:
>> Christoph Lameter writes:
>>
>>> The page allocator must be running and able to serve pages from the boot 
>>> node. This fails for some reason and the slab cannot bootstrap. The memory 
>>> not available is the first guess. Could you trace the allocation in the 
>>> page allocator (__alloc_pages) when the slab attempts to bootstrap and 
>>> figure out why exactly the allocation fails?
>> What is happening is that all pages are getting their zone id field in
>> their page->flags set to point to zone for node 1 by memmap_init_zone
>> calling set_page_links (which does set_page_zone).  Thus, when those
>> pages get freed by free_all_bootmem_node, they all end up in the zone
>> for node 1.
>>
>> memmap_init_zone is called (as memmap_init, since we don't have
>> __HAVE_ARCH_MEMMAP_INIT defined) from init_currently_empty_zone, which
>> is called from free_area_init_core.  Now the thing is that memmap_init
>> and init_currently_empty_zone are called with the node's start PFN and
>> size in pages, *including* holes.  On the partition I'm using we have
>> these PFN ranges for the nodes:
>>
>>     1:        0 ->    32768
>>     0:    32768 ->   278528
>>     1:   278528 ->   524288
>>
>> So node 0's start PFN is 32768 and its size is 245760 pages, and so we
>> correctly set pages 32786 to 278527 to be in the zone for node 0.
>> Then for node 1, we have the start PFN is 0 and the size is 524288, so
>> we then go through and set *all* pages of memory to be in the zone for
>> node 1, including the pages which are actually on node 0.
>>
>> That's why we can't allocate any pages on node 0, and the kmem cache
>> bootstrapping blows up.
>>
>> I don't know this code well enough to know what the correct fix is.
>> Clearly memmap_init_zone should only be touching the pages that are
>> actually present in the zone, but I don't know exactly what data
>> structures it should be using to know what those pages are.
> 
> Mel Gorman and I have been poking at this from different ends.  Mel from
> the context of this thread and myself trying to fix a machine which was
> exhibiting on 32MB of ram in node 0 and the rest in node 1.
> 
> I remember that we used to have code to cope with this in the ppc64
> architecture, indeed I remember reviewing it all that time ago.  Looking
> at the current state of the tree it was removed in the two patches below
> in mainline:
> 	"[PATCH] Remove SPAN_OTHER_NODES config definition"
> 	"[PATCH] mm: remove arch independent NODES_SPAN_OTHER_NODES"
> 
> These commits:
> 	f62859bb6871c5e4a8e591c60befc8caaf54db8c
> 	a94b3ab7eab4edcc9b2cb474b188f774c331adf7
> 
> I'll follow up to this email with the reversion patch we used in
> testing.  It seems to sort this problem out at least, though now its
> blam'ing in ibmveth, so am retesting with yet another patch.  This patch
> reverts the two patches above and updates the commentry on the Kconfig
> entry.

Ok, I've just gotten a successful boot on this box for the first time in
like 15 git releases.  I needed the three patches below:

clameter-fallback_alloc_fix2 -- from earlier in this thread, under the
message ID below:
    <Pine.LNX.4.64.0610131515200.28279@...roedinger.engr.sgi.com>

Reintroduce-NODES_SPAN_OTHER_NODES-for-powerpc -- the patch I just
submitted, under the message ID below:
    <8a76dfd735e544016c5f04c98617b87d@...ky>

ibmveth-fix-index-increment-calculation -- this patch is already in -mm.

Feel free to take this as an ACK for the patches other than mine.

Acked-by: Andy Whitcroft <apw@...dowen.org>

-apw
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ