[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090224125132.GB31295@elte.hu>
Date: Tue, 24 Feb 2009 13:51:32 +0100
From: Ingo Molnar <mingo@...e.hu>
To: Tejun Heo <tj@...nel.org>
Cc: rusty@...tcorp.com.au, tglx@...utronix.de, x86@...nel.org,
linux-kernel@...r.kernel.org, hpa@...or.com, jeremy@...p.org,
cpw@....com, nickpiggin@...oo.com.au, ink@...assic.park.msu.ru
Subject: Re: [PATCHSET x86/core/percpu] improve the first percpu chunk
allocation
* Tejun Heo <tj@...nel.org> wrote:
> > - We'd have a very 'compressed' pte presence in the pagetables:
> > the dynamic percpu area is as tightly packed as possible. With
> > a chunked design we 'scatter' the ptes a bit more broadly.
>
> Can you please elaborate a bit?
Sure. We want to compress data usage on every level of caching.
Part of that is to compress the ptes themselves, as they are
laid out in the pagetables. With your current small-chunks setup
we get this address space layout:
u1 .... [hole] ... u2 .. [hole] .... u3 ... [hole]
Where the 'hole' is the units belonging to other CPUs - but this
CPU is largely uninterested in it. These 'holes' are the larger
the more CPUs there are in the system.
A 'hole' there means that we have a number of ptes that are
unused in that CPU. That is bad in several ways:
- those ptes will still be cached in the CPU - just not used
for anything by that CPU. So the cache utilization ratio for
those ptes will be very low.
- a modern x86 CPU's TLB walker will prefetch into nearby
present ptes quite agressively, based on access patterns it
detects. Having a lot of 'other CPU' ptes present will fool
this CPU into prefetching them, if there's nearby usage.
Those TLB entries are lost and they can create pressure on
and eliminate useful TLB entries as well.
- as the number of CPUs increases in the system, the worse this
situation gets. So up until a certain limit (when the hole
becomes so large that the CPU does not speculate to any
additional level into it) this effects gets worse gradually.
So what i'm saying is that these are strong reasons for us to
want to make the unit size to be something like 2MB - on 64-bit
x86 at least.
( Using a 2MB unit size will also have another advantage: _iff_
we can still allocate a hugepage at that point we can map it
straight there when extending the dynamic area. )
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists