[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090224124042.GA31295@elte.hu>
Date: Tue, 24 Feb 2009 13:40:42 +0100
From: Ingo Molnar <mingo@...e.hu>
To: Tejun Heo <tj@...nel.org>
Cc: rusty@...tcorp.com.au, tglx@...utronix.de, x86@...nel.org,
linux-kernel@...r.kernel.org, hpa@...or.com, jeremy@...p.org,
cpw@....com, nickpiggin@...oo.com.au, ink@...assic.park.msu.ru
Subject: Re: [PATCHSET x86/core/percpu] improve the first percpu chunk
allocation
* Tejun Heo <tj@...nel.org> wrote:
> Hello, Ingo.
>
> Ingo Molnar wrote:
> > Hm, i think there still must be some basic misunderstanding
> > somewhere here. Let me describe the design i described in the
> > previous mail in more detail.
> >
> > In one of your changelogs you state:
> >
> > | On NUMA, embedding allocator can't be used as different
> > | units can't be made to fall in the correct NUMA nodes.
> >
> > This is a direct consequence of the unit/chunk abstraction,
>
> Not at all. That's an optimization for !NUMA. The remap
> allocator is what can be done on NUMA. Chunking or not
> doesn't make any difference in this regard. The only
> difference between chunking and not chunking is whether
> separately allocated percpu offsets have more or less holes
> inbetween them, which is irrelevant for all purposes.
It's not an optimization, it's a pessimisation :)
Please read what i wrote to you. We want the percpu static and
dynamic areas to be _one and the same thing_. (With just the
different that static allocations have a handy compile-time
offset shortcut - but the access is still the same.)
Right now, with your latest code we still have this:
* Use this to get to a cpu's version of the per-cpu object
* dynamically allocated. Non-atomic access to the current CPU's
* version should probably be combined with get_cpu()/put_cpu().
*/
#define per_cpu_ptr(ptr, cpu) SHIFT_PERCPU_PTR((ptr), per_cpu_offset((cpu)))
This slows down per_cpu_ptr() and makes the dynamic percpu case
a second-class citizen because most actual usages are for the
current CPU, still have to go via the per_cpu_offset()
indirection.
I.e. we have things like:
const int cpu = get_cpu();
u8 *scratch = *per_cpu_ptr(ipcomp_scratches, cpu);
Instead of a straight:
u8 *scratch = *this_cpu_ptr(ipcomp_scratches);
We cannot do that optimization due to the NUMA and SMP
assymetry. If NUMA and SMP had the same linear structure, as i
suggested we do, we could do it.
Currently you rely on per_cpu_offset() indirection basically as
a soft-TLB entry covering all dynamic allocations. That sucks.
Ok?
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists