[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090224141217.GA17287@elte.hu>
Date:	Tue, 24 Feb 2009 15:12:17 +0100
From:	Ingo Molnar <mingo@...e.hu>
To:	Tejun Heo <tj@...nel.org>
Cc:	rusty@...tcorp.com.au, tglx@...utronix.de, x86@...nel.org,
	linux-kernel@...r.kernel.org, hpa@...or.com, jeremy@...p.org,
	cpw@....com, nickpiggin@...oo.com.au, ink@...assic.park.msu.ru
Subject: Re: [PATCHSET x86/core/percpu] improve the first percpu chunk
	allocation
* Tejun Heo <tj@...nel.org> wrote:
> What's missing is unification of static and dynamic accessors 
> and thus the faster accessors - percpu_read() and friends - 
> for dynamic ones. This will be the next round of patches.
Ok, good - we are in agreement then and i'll wait for those 
patches.
And i think i finally decoded the real source of the disconnect
:-)
It's still about this restriction:
+       /*
+        * If large page isn't supported, there's no benefit in doing
+        * this.  Also, embedding allocation doesn't play well with
+        * NUMA.
+        */
+       if (!cpu_has_pse || pcpu_need_numa())
+               return -EINVAL;
This is what makes no sense (why force the static percpu area 
into 4K mappings on NUMA).
You do it because i think you misunderstood my original 2MB TLB 
static area suggestion. setup_pcpu_embed() does this now:
+       pcpue_ptr = pcpu_alloc_bootmem(0, num_possible_cpus() * pcpue_unit_size,
+                                      PAGE_SIZE);
That is not NUMA-friendly indeed.
What should be done instead is to up the unit size to 2MB as i 
suggested, and to allocate 2MB sized and 2MB aligned 
numa-correct area for each CPU, via bootmem.
To quote my original mail:
> > - allocate the static percpu area using bootmem-alloc, but
> >   using a 2MB alignment parameter and a 2MB aligned size. Then
> >   we can remap it to some convenient and undisturbed virtual
> >   memory area, using 2MB TLBs. [*]
I.e. each individual 2MB allocated largepage can then be 
remapped as a 2MB TLB to the high (vmalloc) area. Followed by 
ordinary 4K mappings for regular percpu_alloc() pages.
( and the partial, unused pages within this initial chunk are 
  returned to bootmem. )
That will be NUMA-friendly and i suspect we should also use it 
on SMP just to get that aspect of the code tested better.
Do _not_ allocate the units together in one bootmem allocation 
because that's not NUMA-friendly.
Ok?
	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Powered by blists - more mailing lists