linux-kernel - Re: [PATCHSET x86/core/percpu] improve the first percpu chunk allocation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Tue, 24 Feb 2009 22:27:33 +0900
From:	Tejun Heo <tj@...nel.org>
To:	Ingo Molnar <mingo@...e.hu>
CC:	rusty@...tcorp.com.au, tglx@...utronix.de, x86@...nel.org,
	linux-kernel@...r.kernel.org, hpa@...or.com, jeremy@...p.org,
	cpw@....com, nickpiggin@...oo.com.au, ink@...assic.park.msu.ru
Subject: Re: [PATCHSET x86/core/percpu] improve the first percpu chunk	allocation

Hello, Ingo.

Ingo Molnar wrote:
> It's not an optimization, it's a pessimisation :)

Hmmm... big word.  Looking up pessimisation... Ah, okay, it's from
pessimistic.

> Please read what i wrote to you. We want the percpu static and 
> dynamic areas to be _one and the same thing_. (With just the 
> different that static allocations have a handy compile-time 
> offset shortcut - but the access is still the same.)
> 
> Right now, with your latest code we still have this:
> 
>    * Use this to get to a cpu's version of the per-cpu object
>    * dynamically allocated. Non-atomic access to the current  CPU's
>    * version should probably be combined with get_cpu()/put_cpu().
>    */
>   #define per_cpu_ptr(ptr, cpu)   SHIFT_PERCPU_PTR((ptr), per_cpu_offset((cpu)))
> 
> This slows down per_cpu_ptr() and makes the dynamic percpu case 
> a second-class citizen because most actual usages are for the 
> current CPU, still have to go via the per_cpu_offset() 
> indirection.

Heh... I suppose this is why you and I are keeping disagreeing.
Currently, __my_cpu_offset is defined as percpu_read(this_cpu_off) and
__get_cpu_var() is defined as (*SHIFT_PERCPU_PTR(&per_cpu_var(var),
__my_cpu_offset), so our static access is now basically *per_cpu_ptr().

If per_cpu_ptr() is second class citizen, get_cpu_var() is too.  :-)
So, there's nothing more indirect about per_cpu_ptr() compared to
get_cpu_var() anymore.

> We cannot do that optimization due to the NUMA and SMP 
> assymetry. If NUMA and SMP had the same linear structure, as i 
> suggested we do, we could do it.

No no no, there's no difference whatsoever.  Either I'm glossly
misunderstanding something or you're because I really cannot see any
difference between static and dynamic ones except for whether the
offset itself is static or not.

What's missing is unification of static and dynamic accessors and thus
the faster accessors - percpu_read() and friends - for dynamic ones.
This will be the next round of patches.

> Currently you rely on per_cpu_offset() indirection basically as 
> a soft-TLB entry covering all dynamic allocations. That sucks.
> 
> Ok?

IIUC, the per_cpu_offset() indirection stems from %gs addressing
restriction.  We can't teach gcc about it and so the percpu_read() and
friends.  Come on, our static percpu variable uses per_cpu_offset()
too.

If my reality seems to be disassociated from other's more than it
usually is, please feel free to enlighten me.  :-)

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/