[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20060904153613.GA14263@skynet.ie>
Date: Mon, 4 Sep 2006 16:36:13 +0100
From: mel@...net.ie (Mel Gorman)
To: Keith Mannthey <kmannth@...il.com>
Cc: akpm@...l.org, tony.luck@...el.com,
Linux Memory Management List <linux-mm@...ck.org>,
ak@...e.de, bob.picco@...com,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
linuxppc-dev@...abs.org
Subject: Re: [PATCH 4/6] Have x86_64 use add_active_range() and free_area_init_nodes
On (31/08/06 20:08), Keith Mannthey didst pronounce:
> >So, do you actally expect a lot of unused mem_map to be allocated with
> >struct pages that are inactive until memory is hot-added in an
> >x86_64-specific manner? The arch-independent stuff currently will not do
> >that. It sets up memmap for where memory really exists. If that is not
> >what you expect, it will hit issues at hotadd time which is not the
> >current issue but one that can be fixed.
>
> Yes. RESERVED based is a big waste of mem_map space. The add areas
> are marked as RESERVED during boot and then later onlined during add.
> It might be ok. I will play with tomorrow. I might just need to
> call add_active_range in the right spot :)
>
Following this mail should be two patches that may address the problem
with reserved memory hot-add. One assumption made by arch-independent
zone-sizing was that the only memory holes of interest were those before the
end of physical memory. Another assumption was that mem_map should only be
allocated for memory that was physically present in the machine.
With MEMORY_HOTPLUG_RESERVE on x86_64, these assumptions do not hold. This
feature expects that mem_map is allocated at boot time and later activated
on a memory hot-add event. To determine if the region is usable for hot-add
in the future, holes are calculated beyond the end of physical memory.
The following two patches fix these two assumptions. They have been boot-tested
on a range of hardware (x86, ppc64, ia64 and x86_64) so there should be no
new regressions.
I don't have access to hardware that can use MEMORY_HOTPLUG_RESERVE so I'd
appreciate hearing if the patches work. I wrote a test program that simulated
the input from the machine the problem was reported on. It registers active
memory and simulates the check made by reserve_hotadd(). push_node_boundaries()
is called to push the end of the node out by 100 pages like what SRAT would
do for reserve hot-add and it appears to do the right thing. Output is below.
mel@...old:~/patches/brokenout/zonesizing/driver_test$ gcc driver_test.c -o
driver_test && ./driver_test | grep -v "active with" | grep -v
account_node_boundary
Stage 1: Registering active ranges
Entering add_active_range(0, 0, 152) 0 entries of 96 used
Entering add_active_range(0, 256, 524165) 1 entries of 96 used
Entering add_active_range(0, 1048576, 4653056) 2 entries of 96 used
Entering add_active_range(1, 17235968, 18219008) 3 entries of 96 used
Dumping active map
0: 0 0 -> 152
1: 0 256 -> 524165
2: 0 1048576 -> 4653056
3: 1 17235968 -> 18219008
Entering push_node_boundaries(0, 0, 4653156)
Checking reserve-hotadd
absent_pages_in_range(4653056, 17235968) == 17235968 - 4653056 == 12582912
absent_pages_in_range(18219008, 52428800) == 52428800 - 18219008 == 34209792
Stage 2: Calculating zone sizes and holes
Stage 3: Dumping zone sizes and holes
zone_size[0][0] = 4096 zone_holes[0][0] = 104
zone_size[0][1] = 1044480 zone_holes[0][1] = 524411
zone_size[0][2] = 3604580 zone_holes[0][2] = 100
zone_size[1][2] = 983040 zone_holes[1][2] = 0
Stage 4: Printing present pages
On node 0, 4128541 pages
zone 0 present_pages = 3992
zone 1 present_pages = 520069
zone 2 present_pages = 3604480
On node 1, 983040 pages
zone 2 present_pages = 983040
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists