[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4D686EB8.4080507@kernel.org>
Date: Fri, 25 Feb 2011 19:08:40 -0800
From: Yinghai Lu <yinghai@...nel.org>
To: Ingo Molnar <mingo@...e.hu>
CC: Tejun Heo <tj@...nel.org>, Ingo Molnar <mingo@...hat.com>,
Thomas Gleixner <tglx@...utronix.de>,
"H. Peter Anvin" <hpa@...or.com>, x86@...nel.org,
linux-kernel@...r.kernel.org
Subject: [PATCH 3/3] x86,mm,64bit: Round up memory boundary for init_memory_mapping_high()
tj pointed out:
when node does not have 1G aligned boundary, like 128M.
init_memory_mapping_high() could render smaller mapping by 128M on one node,
and 896M on next node with 2M pages instead of 1g page. that could increase
TLB presure.
So if gb page is used, try to align the boundary to 1G before calling
init_memory_mapping_ext(), to make sure only use one 1g entry for that cross
node 1G.
Need to init_meory_mapping_ext() to table tbl_end, to make sure pgtable is on
previous node instead of next node.
on one AMD 512g system with not aligned boundary (extra 768M)
before the patch
[ 0.000000] init_memory_mapping: [0x00000000000000-0x000000d7f9ffff]
[ 0.000000] 0000000000 - 00c0000000 page 1G
[ 0.000000] 00c0000000 - 00d7e00000 page 2M
[ 0.000000] 00d7e00000 - 00d7fa0000 page 4k
[ 0.000000] kernel direct mapping tables up to d7fa0000 @ [0xd7f9d000-0xd7f9ffff] pre-allocated
[ 0.000000] kernel direct mapping tables up to d7fa0000 @ [0xd7f9d000-0xd7f9efff] final
[ 0.000000] memblock_x86_reserve_range: [0xd7f9d000-0xd7f9efff] PGTABLE
...
[ 0.000000] Adding active range (0, 0x10, 0x98) 0 entries of 3200 used
[ 0.000000] Adding active range (0, 0x100, 0xd7fa0) 1 entries of 3200 used
[ 0.000000] Adding active range (0, 0x100000, 0x1028000) 2 entries of 3200 used
[ 0.000000] Adding active range (1, 0x1028000, 0x2028000) 3 entries of 3200 used
[ 0.000000] Adding active range (2, 0x2028000, 0x3028000) 4 entries of 3200 used
[ 0.000000] Adding active range (3, 0x3028000, 0x4028000) 5 entries of 3200 used
[ 0.000000] Adding active range (4, 0x4028000, 0x5028000) 6 entries of 3200 used
[ 0.000000] Adding active range (5, 0x5028000, 0x6028000) 7 entries of 3200 used
[ 0.000000] Adding active range (6, 0x6028000, 0x7028000) 8 entries of 3200 used
[ 0.000000] Adding active range (7, 0x7028000, 0x8028000) 9 entries of 3200 used
[ 0.000000] init_memory_mapping: [0x00000100000000-0x00001027ffffff]
[ 0.000000] 0100000000 - 1000000000 page 1G
[ 0.000000] 1000000000 - 1028000000 page 2M
[ 0.000000] kernel direct mapping tables up to 1028000000 @ [0x1027ffe000-0x1027ffffff] pre-allocated
[ 0.000000] kernel direct mapping tables up to 1028000000 @ [0x1027ffe000-0x1027ffefff] final
[ 0.000000] memblock_x86_reserve_range: [0x1027ffe000-0x1027ffefff] PGTABLE
[ 0.000000] init_memory_mapping: [0x00001028000000-0x00002027ffffff]
[ 0.000000] 1028000000 - 1040000000 page 2M
[ 0.000000] 1040000000 - 2000000000 page 1G
[ 0.000000] 2000000000 - 2028000000 page 2M
[ 0.000000] kernel direct mapping tables up to 2028000000 @ [0x2027ffe000-0x2027ffffff] pre-allocated
[ 0.000000] kernel direct mapping tables up to 2028000000 @ [0x2027ffe000-0x2027ffefff] final
[ 0.000000] memblock_x86_reserve_range: [0x2027ffe000-0x2027ffefff] PGTABLE
[ 0.000000] init_memory_mapping: [0x00002028000000-0x00003027ffffff]
[ 0.000000] 2028000000 - 2040000000 page 2M
[ 0.000000] 2040000000 - 3000000000 page 1G
[ 0.000000] 3000000000 - 3028000000 page 2M
[ 0.000000] kernel direct mapping tables up to 3028000000 @ [0x3027ffe000-0x3027ffffff] pre-allocated
[ 0.000000] kernel direct mapping tables up to 3028000000 @ [0x3027ffe000-0x3027ffefff] final
[ 0.000000] memblock_x86_reserve_range: [0x3027ffe000-0x3027ffefff] PGTABLE
[ 0.000000] init_memory_mapping: [0x00003028000000-0x00004027ffffff]
[ 0.000000] 3028000000 - 3040000000 page 2M
[ 0.000000] 3040000000 - 4000000000 page 1G
[ 0.000000] 4000000000 - 4028000000 page 2M
[ 0.000000] kernel direct mapping tables up to 4028000000 @ [0x4027ffe000-0x4027ffffff] pre-allocated
[ 0.000000] kernel direct mapping tables up to 4028000000 @ [0x4027ffe000-0x4027ffefff] final
[ 0.000000] memblock_x86_reserve_range: [0x4027ffe000-0x4027ffefff] PGTABLE
[ 0.000000] init_memory_mapping: [0x00004028000000-0x00005027ffffff]
[ 0.000000] 4028000000 - 4040000000 page 2M
[ 0.000000] 4040000000 - 5000000000 page 1G
[ 0.000000] 5000000000 - 5028000000 page 2M
[ 0.000000] kernel direct mapping tables up to 5028000000 @ [0x5027ffe000-0x5027ffffff] pre-allocated
[ 0.000000] kernel direct mapping tables up to 5028000000 @ [0x5027ffe000-0x5027ffefff] final
[ 0.000000] memblock_x86_reserve_range: [0x5027ffe000-0x5027ffefff] PGTABLE
[ 0.000000] init_memory_mapping: [0x00005028000000-0x00006027ffffff]
[ 0.000000] 5028000000 - 5040000000 page 2M
[ 0.000000] 5040000000 - 6000000000 page 1G
[ 0.000000] 6000000000 - 6028000000 page 2M
[ 0.000000] kernel direct mapping tables up to 6028000000 @ [0x6027ffe000-0x6027ffffff] pre-allocated
[ 0.000000] kernel direct mapping tables up to 6028000000 @ [0x6027ffe000-0x6027ffefff] final
[ 0.000000] memblock_x86_reserve_range: [0x6027ffe000-0x6027ffefff] PGTABLE
[ 0.000000] init_memory_mapping: [0x00006028000000-0x00007027ffffff]
[ 0.000000] 6028000000 - 6040000000 page 2M
[ 0.000000] 6040000000 - 7000000000 page 1G
[ 0.000000] 7000000000 - 7028000000 page 2M
[ 0.000000] kernel direct mapping tables up to 7028000000 @ [0x7027ffe000-0x7027ffffff] pre-allocated
[ 0.000000] kernel direct mapping tables up to 7028000000 @ [0x7027ffe000-0x7027ffefff] final
[ 0.000000] memblock_x86_reserve_range: [0x7027ffe000-0x7027ffefff] PGTABLE
[ 0.000000] init_memory_mapping: [0x00007028000000-0x00008027ffffff]
[ 0.000000] 7028000000 - 7040000000 page 2M
[ 0.000000] 7040000000 - 8000000000 page 1G
[ 0.000000] 8000000000 - 8028000000 page 2M
[ 0.000000] kernel direct mapping tables up to 8028000000 @ [0x8027ffd000-0x8027ffffff] pre-allocated
[ 0.000000] kernel direct mapping tables up to 8028000000 @ [0x8027ffd000-0x8027ffefff] final
[ 0.000000] memblock_x86_reserve_range: [0x8027ffd000-0x8027ffefff] PGTABLE
after patch
...
[ 0.000000] init_memory_mapping: [0x00000100000000-0x0000103fffffff]
[ 0.000000] 0100000000 - 1040000000 page 1G
[ 0.000000] kernel direct mapping tables up to 1040000000 @ [0x1027fff000-0x1027ffffff] pre-allocated
[ 0.000000] init_memory_mapping: [0x00001040000000-0x0000203fffffff]
[ 0.000000] 1040000000 - 2040000000 page 1G
[ 0.000000] kernel direct mapping tables up to 2040000000 @ [0x2027fff000-0x2027ffffff] pre-allocated
[ 0.000000] init_memory_mapping: [0x00002040000000-0x0000303fffffff]
[ 0.000000] 2040000000 - 3040000000 page 1G
[ 0.000000] kernel direct mapping tables up to 3040000000 @ [0x3027fff000-0x3027ffffff] pre-allocated
[ 0.000000] init_memory_mapping: [0x00003040000000-0x0000403fffffff]
[ 0.000000] 3040000000 - 4040000000 page 1G
[ 0.000000] kernel direct mapping tables up to 4040000000 @ [0x4027fff000-0x4027ffffff] pre-allocated
[ 0.000000] init_memory_mapping: [0x00004040000000-0x0000503fffffff]
[ 0.000000] 4040000000 - 5040000000 page 1G
[ 0.000000] kernel direct mapping tables up to 5040000000 @ [0x5027fff000-0x5027ffffff] pre-allocated
[ 0.000000] init_memory_mapping: [0x00005040000000-0x0000603fffffff]
[ 0.000000] 5040000000 - 6040000000 page 1G
[ 0.000000] kernel direct mapping tables up to 6040000000 @ [0x6027fff000-0x6027ffffff] pre-allocated
[ 0.000000] init_memory_mapping: [0x00006040000000-0x0000703fffffff]
[ 0.000000] 6040000000 - 7040000000 page 1G
[ 0.000000] kernel direct mapping tables up to 7040000000 @ [0x7027fff000-0x7027ffffff] pre-allocated
[ 0.000000] init_memory_mapping: [0x00007040000000-0x00008027ffffff]
[ 0.000000] 7040000000 - 8000000000 page 1G
[ 0.000000] 8000000000 - 8028000000 page 2M
[ 0.000000] kernel direct mapping tables up to 8028000000 @ [0x8027ffd000-0x8027ffffff] pre-allocated
[ 0.000000] kernel direct mapping tables up to 8028000000 @ [0x8027ffd000-0x8027ffefff] final
[ 0.000000] memblock_x86_reserve_range: [0x8027ffd000-0x8027ffefff] PGTABLE
So it fix the extra mapping problem.
-v2: Ingo is not happy with the #ifdef detection etc.
use page_size_mask instead of checking if gbpage is used or not.
Reported-by: Tejun Heo <tj@...nel.org>
Signed-off-by: Yinghai Lu <yinghai@...nel.org>
---
arch/x86/mm/init_64.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
Index: linux-2.6/arch/x86/mm/init_64.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/init_64.c
+++ linux-2.6/arch/x86/mm/init_64.c
@@ -614,6 +614,7 @@ struct mapping_work_data {
unsigned long start;
unsigned long end;
unsigned long pfn_mapped;
+ unsigned long align;
};
static int __init_refok
@@ -621,7 +622,14 @@ mapping_work_fn(unsigned long start_pfn,
{
struct mapping_work_data *data = datax;
unsigned long pfn_mapped;
- unsigned long final_start, final_end;
+ unsigned long final_start, final_end, tbl_end;
+
+ tbl_end = end_pfn << PAGE_SHIFT;
+ /* need to align them to 1G or 2M boundary to avoid smaller mapping */
+ start_pfn = round_down(start_pfn, data->align>>PAGE_SHIFT);
+ if (start_pfn < data->pfn_mapped)
+ start_pfn = data->pfn_mapped;
+ end_pfn = round_up(end_pfn, data->align>>PAGE_SHIFT);
final_start = max_t(unsigned long, start_pfn<<PAGE_SHIFT, data->start);
final_end = min_t(unsigned long, end_pfn<<PAGE_SHIFT, data->end);
@@ -629,7 +637,7 @@ mapping_work_fn(unsigned long start_pfn,
if (final_end <= final_start)
return 0;
- pfn_mapped = init_memory_mapping(final_start, final_end);
+ pfn_mapped = init_memory_mapping_ext(final_start, final_end, tbl_end);
if (pfn_mapped > data->pfn_mapped)
data->pfn_mapped = pfn_mapped;
@@ -645,6 +653,7 @@ init_memory_mapping_active_regions(unsig
data.start = start;
data.end = end;
data.pfn_mapped = 0;
+ data.align = (page_size_mask & (1<<PG_LEVEL_1G)) ? 1UL<<30 : 1UL<<21;
work_with_active_regions(MAX_NUMNODES, mapping_work_fn, &data);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists