linux-kernel - Re: questions about init_memory_mapping

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20110224091557.GD7840@htj.dyndns.org>
Date:	Thu, 24 Feb 2011 10:15:57 +0100
From:	Tejun Heo <tj@...nel.org>
To:	Yinghai Lu <yinghai@...nel.org>
Cc:	x86@...nel.org, Ingo Molnar <mingo@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	"H. Peter Anvin" <hpa@...or.com>, linux-kernel@...r.kernel.org
Subject: Re: questions about init_memory_mapping_high()

Hey, again.

On Wed, Feb 23, 2011 at 02:17:34PM -0800, Yinghai Lu wrote:
> > Hmmm... I'm not really following.  Can you elaborate?  The reason why
> > smaller mapping is bad is because of increased TLB pressure.  What
> > does using the existing entries have to do with it?
> 
> assume 1g page is used. first node will actually mapped 512G already.
> so if the system only have 1024g. then first 512g page table will on node0 ram.
> second 512g page table will be on node4.
> 
> when only 2M are used, it is 1G boundary. for 1024g system.
> page table (about 512k) for mem 0-128g is on node0.
> page table (about 512k) for mem 128g-256g is on node1.
> ...
> Do you mean we need to put those all 512k together to reduce TLB presure?

Nope, let's say the machine supports 1GiB mapping, has 8GiB of memory
where [0,4)GiB is node 0 and [4,8)GiB node1, and there's a hole of
128MiB right on top of 4GiB.  Before the change, the page mapping code
wouldn't care about the whole and just map the whole [0,8)GiB area
with eight 1GiB mapping.  Now with your change, [4, 5)GiB will be
mapped using 2MiB mappings to avoid mapping the 128MiB hole.

We end up unnecessarily using smaller size mappings (512 2MiB mappings
instead of 1 1GiB mapping) thus increasing TLB pressure.  There is no
reason to match the linear address mapping exactly to the physical
memory map.  It is no accident that the original code didn't consider
memory holes.  Using larger mappings over them is more beneficial to
trying to punch holes with smaller mappings.

This rather important change was made without any description or
explanation, which I find somewhat disturbing.  Anyways, what we can
do is just taking bottom and top addresses of occupied NUMA regions
and round them down and up, respectively, to the largest page mapping
size supported as long as the top address doesn't go over max_pfn
instead of mapping exactly according to the memblocks.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/