lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 25 Feb 2011 12:16:06 +0100
From:	Tejun Heo <tj@...nel.org>
To:	Yinghai Lu <yinghai@...nel.org>
Cc:	Ingo Molnar <mingo@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	"H. Peter Anvin" <hpa@...or.com>, x86@...nel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/2] x86,mm,64bit: Round up memory boundary for
 init_memory_mapping_high()

On Thu, Feb 24, 2011 at 10:20:35PM -0800, Yinghai Lu wrote:
> tj pointed out:
> 	when node does not have 1G aligned boundary, like 128M.
> init_memory_mapping_high() could render smaller mapping by 128M on one node,
> and 896M on next node with 2M pages instead of 1g page. that could increase
> TLB presure.
> 
> So if gb page is used, try to align the boundary to 1G before calling
> init_memory_mapping_ext(), to make sure only use one 1g entry for that cross
> node 1G.
> Need to init_meory_mapping_ext() to table tbl_end, to make sure pgtable is on
> previous node instead of next node.

I don't know, Yinghai.  The whole code seems overly complicated to me.
Just ignore e820 map when building linear mapping.  It doesn't matter.
Why not just do something like the following?  Also, can you please
add some comments explaining how the NUMA affine allocation actually
works for page tables?  Or better, can you please make that explicit?
It currently depends on memories being registered in ascending address
order, right?  The memblock code already is NUMA aware, I think it
would be far better to make the node affine part explicit.

Thanks.

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 46e684f..4fd0b59 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -966,6 +966,11 @@ void __init setup_arch(char **cmdline_p)
 	memblock.current_limit = get_max_mapped();
 
 	/*
+	 * Add whole lot of comment explaining what's going on and WHY
+	 * because as it currently stands, it's frigging cryptic.
+	 */
+
+	/*
 	 * NOTE: On x86-32, only from this point on, fixmaps are ready for use.
 	 */
 
diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
index 7757d22..50ec03c 100644
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -536,8 +536,6 @@ static int __init numa_register_memblks(struct numa_meminfo *mi)
 	if (!numa_meminfo_cover_memory(mi))
 		return -EINVAL;
 
-	init_memory_mapping_high();
-
 	/* Finally register nodes. */
 	for_each_node_mask(nid, node_possible_map) {
 		u64 start = (u64)max_pfn << PAGE_SHIFT;
@@ -550,8 +548,12 @@ static int __init numa_register_memblks(struct numa_meminfo *mi)
 			end = max(mi->blk[i].end, end);
 		}
 
-		if (start < end)
+		if (start < end) {
+			init_memory_mapping(
+			  ALIGN_DOWN_TO_MAX_MAP_SIZE_AND_CONVERT_TO_PFN(start),
+			  ALIGN_UP_SIMILARY_BUT_DONT_GO_OVER_MAX_PFN(end));
 			setup_node_bootmem(nid, start, end);
+		}
 	}
 
 	return 0;


-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ