linux-kernel - Re: [PATCHSET x86/core/percpu] improve the first percpu chunk allocation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20090224125132.GB31295@elte.hu>
Date:	Tue, 24 Feb 2009 13:51:32 +0100
From:	Ingo Molnar <mingo@...e.hu>
To:	Tejun Heo <tj@...nel.org>
Cc:	rusty@...tcorp.com.au, tglx@...utronix.de, x86@...nel.org,
	linux-kernel@...r.kernel.org, hpa@...or.com, jeremy@...p.org,
	cpw@....com, nickpiggin@...oo.com.au, ink@...assic.park.msu.ru
Subject: Re: [PATCHSET x86/core/percpu] improve the first percpu chunk
	allocation

* Tejun Heo <tj@...nel.org> wrote:

> > - We'd have a very 'compressed' pte presence in the pagetables: 
> >   the dynamic percpu area is as tightly packed as possible. With 
> >   a chunked design we 'scatter' the ptes a bit more broadly.
> 
> Can you please elaborate a bit?

Sure. We want to compress data usage on every level of caching. 

Part of that is to compress the ptes themselves, as they are 
laid out in the pagetables. With your current small-chunks setup 
we get this address space layout:

    u1 .... [hole] ... u2 .. [hole] .... u3 ... [hole]

Where the 'hole' is the units belonging to other CPUs - but this 
CPU is largely uninterested in it. These 'holes' are the larger 
the more CPUs there are in the system.

A 'hole' there means that we have a number of ptes that are 
unused in that CPU. That is bad in several ways:

 - those ptes will still be cached in the CPU - just not used 
   for anything by that CPU. So the cache utilization ratio for 
   those ptes will be very low.

 - a modern x86 CPU's TLB walker will prefetch into nearby 
   present ptes quite agressively, based on access patterns it 
   detects. Having a lot of 'other CPU' ptes present will fool 
   this CPU into prefetching them, if there's nearby usage. 
   Those TLB entries are lost and they can create pressure on 
   and eliminate useful TLB entries as well.

 - as the number of CPUs increases in the system, the worse this 
   situation gets. So up until a certain limit (when the hole
   becomes so large that the CPU does not speculate to any 
   additional level into it) this effects gets worse gradually.

So what i'm saying is that these are strong reasons for us to 
want to make the unit size to be something like 2MB - on 64-bit 
x86 at least.

( Using a 2MB unit size will also have another advantage: _iff_ 
  we can still allocate a hugepage at that point we can map it 
  straight there when extending the dynamic area. )

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/