lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 5 Feb 2009 16:43:30 +0900
From:	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
To:	David Miller <davem@...emloft.net>
Cc:	mel@....ul.ie, heiko.carstens@...ibm.com,
	akpm@...ux-foundation.org, linux-kernel@...r.kernel.org,
	sparclinux@...r.kernel.org
Subject: Re: HOLES_IN_ZONE...

On Wed, 04 Feb 2009 22:26:51 -0800 (PST)
David Miller <davem@...emloft.net> wrote:

> 
> So I've been fighting mysterious crashes on my main sparc64 devel
> machine.  What's happening is that the assertion in
> mm/page_alloc.c:move_freepages() is triggering:
> 
> 	BUG_ON(page_zone(start_page) != page_zone(end_page));
> 
> Once I knew this is what was happening, I added some annotations:
> 
> 	if (unlikely(page_zone(start_page) != page_zone(end_page))) {
> 		printk(KERN_ERR "move_freepages: Bogus zones: "
> 		       "start_page[%p] end_page[%p] zone[%p]\n",
> 		       start_page, end_page, zone);
> 		printk(KERN_ERR "move_freepages: "
> 		       "start_zone[%p] end_zone[%p]\n",
> 		       page_zone(start_page), page_zone(end_page));
> 		printk(KERN_ERR "move_freepages: "
> 		       "start_pfn[0x%lx] end_pfn[0x%lx]\n",
> 		       page_to_pfn(start_page), page_to_pfn(end_page));
> 		printk(KERN_ERR "move_freepages: "
> 		       "start_nid[%d] end_nid[%d]\n",
> 		       page_to_nid(start_page), page_to_nid(end_page));
>  ...
> 
> And here's what I got:
> 
> 	move_freepages: Bogus zones: start_page[2207d0000] end_page[2207dffc0] zone[fffff8103effcb00]
> 	move_freepages: start_zone[fffff8103effcb00] end_zone[fffff8003fffeb00]
> 	move_freepages: start_pfn[0x81f600] end_pfn[0x81f7ff]
> 	move_freepages: start_nid[1] end_nid[0]
> 
> My memory layout on this box is:
> 
> [    0.000000] Zone PFN ranges:
> [    0.000000]   Normal   0x00000000 -> 0x0081ff5d
> [    0.000000] Movable zone start PFN for each node
> [    0.000000] early_node_map[8] active PFN ranges
> [    0.000000]     0: 0x00000000 -> 0x00020000
> [    0.000000]     1: 0x00800000 -> 0x0081f7ff
> [    0.000000]     1: 0x0081f800 -> 0x0081fe50
> [    0.000000]     1: 0x0081fed1 -> 0x0081fed8
> [    0.000000]     1: 0x0081feda -> 0x0081fedb
> [    0.000000]     1: 0x0081fedd -> 0x0081fee5
> [    0.000000]     1: 0x0081fee7 -> 0x0081ff51
> [    0.000000]     1: 0x0081ff59 -> 0x0081ff5d
> 

Ah, end_pfn is not valid page. And, page->flags shows nid 0.
It seems memmap for end_pfn is not initialized correctly.

At first, there are some complicated around here..

1. pfn_valid() is just for "there is memmap." not for "the memory is valid"
2. If "memory is invalid" && it has memmap, it should be marked as PG_Reserved.
   And it will never be put into buddy allocator. 
3. memmap for not exisiting memory can be initialized but it's depends on
   zone->spanned_pages. (see free_area_init_core())
4. What CONFIG_HOLES_IN_ZONE means is 
   "there can be invalid memmap within coutinuous range of zone->mem_map"
   This comes from VIRTUAL_MEMMAP.
   In usual arch, mem_map is guaranteed to be coutinuous always.




> 	move_freepages: start_zone[fffff8103effcb00] end_zone[fffff8003fffeb00]
> 	move_freepages: start_pfn[0x81f600] end_pfn[0x81f7ff]
> 	move_freepages: start_nid[1] end_nid[0]
> [    0.000000]     0: 0x00000000 -> 0x00020000
> [    0.000000]     1: 0x00800000 -> 0x0081f7ff

> [    0.000000]     1: 0x00800000 -> 0x0081f7ff

I think it's strange that end_pfn's nid is 0.

>From this log, mem_map for end_pfn exists (means pfn_valid(end_pfn) == true)
So, it should be initialized correctly and should have nid 1 if initialized.

Maybe Node1's zone->start_pfn and zone->spanned_pages covers 0x81f7ff, and it's
range is 0x00800000 -> 0x0081ff5d

But,  this check in memmap_init_zone()
==
2619                 if (context == MEMMAP_EARLY) {
2620                         if (!early_pfn_valid(pfn))
2621                                 continue;
2622                         if (!early_pfn_in_nid(pfn, nid))
2623                                 continue;
2624                 }
==
will allow skip to init this mem_map of 0x8af7ff.
*AND*, SetPageResreved() is never called. This is a problem I think.

> It takes a lot of stressing to get that specific chunk of pages to
> attempt to be freed up in a group like that :-/
> 
> As a suggestion, it would have been a lot more pleasant if the code
> validated this requirement (in the !HOLES_IN_ZONE case) at boot time
> instead of after 2 hours of stress testing :-(
> 

Can this patch help you ? (maybe more careful study is necessary...)
---
 mm/page_alloc.c |    7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

Index: mmotm-2.6.29-Feb03/mm/page_alloc.c
===================================================================
--- mmotm-2.6.29-Feb03.orig/mm/page_alloc.c
+++ mmotm-2.6.29-Feb03/mm/page_alloc.c
@@ -2618,6 +2618,7 @@ void __meminit memmap_init_zone(unsigned
 	unsigned long end_pfn = start_pfn + size;
 	unsigned long pfn;
 	struct zone *z;
+	int tmp;
 
 	if (highest_memmap_pfn < end_pfn - 1)
 		highest_memmap_pfn = end_pfn - 1;
@@ -2632,7 +2633,8 @@ void __meminit memmap_init_zone(unsigned
 		if (context == MEMMAP_EARLY) {
 			if (!early_pfn_valid(pfn))
 				continue;
-			if (!early_pfn_in_nid(pfn, nid))
+			tmp = early_pfn_in_nid(pfn, nid);
+			if (tmp > -1 && tmp != nid)
 				continue;
 		}
 		page = pfn_to_page(pfn);
@@ -2999,8 +3001,9 @@ int __meminit early_pfn_to_nid(unsigned 
 			return early_node_map[i].nid;
 	}
 
-	return 0;
+	return -1;
 }
+
 #endif /* CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID */
 
 /* Basic iterator support to walk early_node_map[] */




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ