[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090224141217.GA17287@elte.hu>
Date: Tue, 24 Feb 2009 15:12:17 +0100
From: Ingo Molnar <mingo@...e.hu>
To: Tejun Heo <tj@...nel.org>
Cc: rusty@...tcorp.com.au, tglx@...utronix.de, x86@...nel.org,
linux-kernel@...r.kernel.org, hpa@...or.com, jeremy@...p.org,
cpw@....com, nickpiggin@...oo.com.au, ink@...assic.park.msu.ru
Subject: Re: [PATCHSET x86/core/percpu] improve the first percpu chunk
allocation
* Tejun Heo <tj@...nel.org> wrote:
> What's missing is unification of static and dynamic accessors
> and thus the faster accessors - percpu_read() and friends -
> for dynamic ones. This will be the next round of patches.
Ok, good - we are in agreement then and i'll wait for those
patches.
And i think i finally decoded the real source of the disconnect
:-)
It's still about this restriction:
+ /*
+ * If large page isn't supported, there's no benefit in doing
+ * this. Also, embedding allocation doesn't play well with
+ * NUMA.
+ */
+ if (!cpu_has_pse || pcpu_need_numa())
+ return -EINVAL;
This is what makes no sense (why force the static percpu area
into 4K mappings on NUMA).
You do it because i think you misunderstood my original 2MB TLB
static area suggestion. setup_pcpu_embed() does this now:
+ pcpue_ptr = pcpu_alloc_bootmem(0, num_possible_cpus() * pcpue_unit_size,
+ PAGE_SIZE);
That is not NUMA-friendly indeed.
What should be done instead is to up the unit size to 2MB as i
suggested, and to allocate 2MB sized and 2MB aligned
numa-correct area for each CPU, via bootmem.
To quote my original mail:
> > - allocate the static percpu area using bootmem-alloc, but
> > using a 2MB alignment parameter and a 2MB aligned size. Then
> > we can remap it to some convenient and undisturbed virtual
> > memory area, using 2MB TLBs. [*]
I.e. each individual 2MB allocated largepage can then be
remapped as a 2MB TLB to the high (vmalloc) area. Followed by
ordinary 4K mappings for regular percpu_alloc() pages.
( and the partial, unused pages within this initial chunk are
returned to bootmem. )
That will be NUMA-friendly and i suspect we should also use it
on SMP just to get that aspect of the code tested better.
Do _not_ allocate the units together in one bootmem allocation
because that's not NUMA-friendly.
Ok?
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists