lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090224125132.GB31295@elte.hu>
Date:	Tue, 24 Feb 2009 13:51:32 +0100
From:	Ingo Molnar <mingo@...e.hu>
To:	Tejun Heo <tj@...nel.org>
Cc:	rusty@...tcorp.com.au, tglx@...utronix.de, x86@...nel.org,
	linux-kernel@...r.kernel.org, hpa@...or.com, jeremy@...p.org,
	cpw@....com, nickpiggin@...oo.com.au, ink@...assic.park.msu.ru
Subject: Re: [PATCHSET x86/core/percpu] improve the first percpu chunk
	allocation


* Tejun Heo <tj@...nel.org> wrote:

> > - We'd have a very 'compressed' pte presence in the pagetables: 
> >   the dynamic percpu area is as tightly packed as possible. With 
> >   a chunked design we 'scatter' the ptes a bit more broadly.
> 
> Can you please elaborate a bit?

Sure. We want to compress data usage on every level of caching. 

Part of that is to compress the ptes themselves, as they are 
laid out in the pagetables. With your current small-chunks setup 
we get this address space layout:

    u1 .... [hole] ... u2 .. [hole] .... u3 ... [hole]

Where the 'hole' is the units belonging to other CPUs - but this 
CPU is largely uninterested in it. These 'holes' are the larger 
the more CPUs there are in the system.

A 'hole' there means that we have a number of ptes that are 
unused in that CPU. That is bad in several ways:

 - those ptes will still be cached in the CPU - just not used 
   for anything by that CPU. So the cache utilization ratio for 
   those ptes will be very low.

 - a modern x86 CPU's TLB walker will prefetch into nearby 
   present ptes quite agressively, based on access patterns it 
   detects. Having a lot of 'other CPU' ptes present will fool 
   this CPU into prefetching them, if there's nearby usage. 
   Those TLB entries are lost and they can create pressure on 
   and eliminate useful TLB entries as well.

 - as the number of CPUs increases in the system, the worse this 
   situation gets. So up until a certain limit (when the hole
   becomes so large that the CPU does not speculate to any 
   additional level into it) this effects gets worse gradually.

So what i'm saying is that these are strong reasons for us to 
want to make the unit size to be something like 2MB - on 64-bit 
x86 at least.

( Using a 2MB unit size will also have another advantage: _iff_ 
  we can still allocate a hugepage at that point we can map it 
  straight there when extending the dynamic area. )

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ