[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <1351245581-16652-1-git-send-email-laijs@cn.fujitsu.com>
Date: Fri, 26 Oct 2012 17:59:31 +0800
From: Lai Jiangshan <laijs@...fujitsu.com>
To: linux-kernel@...r.kernel.org, Mel Gorman <mgorman@...e.de>
Cc: Lai Jiangshan <laijs@...fujitsu.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Minchan Kim <minchan@...nel.org>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
Michal Hocko <mhocko@...e.cz>, linux-mm@...ck.org
Subject: [PATCH] page_alloc: fix the incorrect adjustment to zone->present_pages
Current free_area_init_core() has incorrect adjustment code to adjust
->present_pages. It will cause ->present_pages overflow, make the
system unusable(can't create any process/thread in our test) and cause further problem.
Details:
1) Some/many ZONEs don't have memory which is used by memmap.
{ Or all the actual memory used for memmap is much less than the "memmap_pages"
(memmap_pages = PAGE_ALIGN(span_size * sizeof(struct page)) >> PAGE_SHIFT)
CONFIG_SPARSEMEM is an example. }
2) incorrect adjustment in free_area_init_core(): zone->present_pages -= memmap_pages
3) but the zone has big hole, it causes the result of zone->present_pages become much smaller
4) when we offline a/several memory section of the zone: zone->present_pages -= offline_size
5) Now, zone->present_pages will/may be *OVERFLOW*.
So the adjustment is dangerous and incorrect.
Addition 1:
And in current kernel, the memmaps have nothing related/bound to any ZONE:
FLATMEM: global memmap
CONFIG_DISCONTIGMEM: node-specific memmap
CONFIG_SPARSEMEM: memorysection-specific memmap
None of them is ZONE-specific memmap, and the memory used for memmap is not bound to any ZONE.
So the adjustment "zone->present_pages -= memmap_pages" subtracts unrelated value
and makes no sense.
Addition 2:
We introduced this adjustment and tried to make page-reclaim/watermark happier,
but the adjustment is wrong in current kernel, and even makes page-reclaim/watermark
worse. It is against its original purpose/reason.
This adjustment is incorrect/buggy, subtracts unrelated value and violates its original
purpose, so we simply remove the adjustment.
CC: Mel Gorman <mgorman@...e.de>
Signed-off-by: Lai Jiangshan <laijs@...fujitsu.com>
---
mm/page_alloc.c | 20 +-------------------
1 files changed, 1 insertions(+), 19 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index bb90971..6bf72e3 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4455,30 +4455,12 @@ static void __paginginit free_area_init_core(struct pglist_data *pgdat,
for (j = 0; j < MAX_NR_ZONES; j++) {
struct zone *zone = pgdat->node_zones + j;
- unsigned long size, realsize, memmap_pages;
+ unsigned long size, realsize;
size = zone_spanned_pages_in_node(nid, j, zones_size);
realsize = size - zone_absent_pages_in_node(nid, j,
zholes_size);
- /*
- * Adjust realsize so that it accounts for how much memory
- * is used by this zone for memmap. This affects the watermark
- * and per-cpu initialisations
- */
- memmap_pages =
- PAGE_ALIGN(size * sizeof(struct page)) >> PAGE_SHIFT;
- if (realsize >= memmap_pages) {
- realsize -= memmap_pages;
- if (memmap_pages)
- printk(KERN_DEBUG
- " %s zone: %lu pages used for memmap\n",
- zone_names[j], memmap_pages);
- } else
- printk(KERN_WARNING
- " %s zone: %lu pages exceeds realsize %lu\n",
- zone_names[j], memmap_pages, realsize);
-
/* Account for reserved pages */
if (j == 0 && realsize > dma_reserve) {
realsize -= dma_reserve;
--
1.7.4.4
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists