lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sat, 31 Jan 2009 02:30:25 -0800
From:	Jeremy Fitzhardinge <jeremy@...p.org>
To:	Ingo Molnar <mingo@...e.hu>
CC:	"H. Peter Anvin" <hpa@...or.com>, Tejun Heo <tj@...nel.org>,
	Brian Gerst <brgerst@...il.com>, ebiederm@...ssion.com,
	cl@...ux-foundation.org, rusty@...tcorp.com.au, travis@....com,
	linux-kernel@...r.kernel.org, akpm@...ux-foundation.org,
	steiner@....com, hugh@...itas.com
Subject: Re: [patch] add optimized generic percpu accessors

Ingo Molnar wrote:
> Tejun, could you please also add the patch below to your lineup too?
>
> It is an optimization and a cleanup, and adds the following new generic 
> percpu methods:
>
>   percpu_read()
>   percpu_write()
>   percpu_add()
>   percpu_sub()
>   percpu_or() 
>   percpu_xor()
>
> and implements support for them on x86. (other architectures will fall 
> back to a default implementation)
>
> The advantage is that for example to read a local percpu variable, instead 
> of this sequence:
>
>  return __get_cpu_var(var);
>
>  ffffffff8102ca2b:	48 8b 14 fd 80 09 74 	mov    -0x7e8bf680(,%rdi,8),%rdx
>  ffffffff8102ca32:	81 
>  ffffffff8102ca33:	48 c7 c0 d8 59 00 00 	mov    $0x59d8,%rax
>  ffffffff8102ca3a:	48 8b 04 10          	mov    (%rax,%rdx,1),%rax
>
> We can get a single instruction by using the optimized variants:
>
>  return percpu_read(var);
>
>  ffffffff8102ca3f:	65 48 8b 05 91 8f fd 	mov    %gs:0x7efd8f91(%rip),%rax
>
> I also cleaned up the x86-specific APIs and made the x86 code use these 
> new generic percpu primitives.
>
> It looks quite hard to convince the compiler to generate the optimized 
> single-instruction sequence for us out of __get_cpu_var(var) - or can you 
> perhaps see a way to do it?
>
> The patch is against your latest zero-based percpu / pda unification tree. 
> Untested.

I have no objection to this patch overall, or the use of non-arch 
specific names.

But there is one subtle thing you're overlooking here.  The x86_*_percpu 
operations are guaranteed to be atomic with respect to preemption, so 
you can use them when preemption is enabled.  When they compile down to 
one instruction then that happens naturally, otherwise they have to be 
wrapped with preempt_disable/enable.

Otherwise, being able to access a percpu with one instruction is nice, 
but it isn't all that efficient.  If you're going to access a variable 
more than once or twice, its more efficient to take the address and 
access it via that.

So, upshot, I think the default versions should be wrapped in 
preempt_disable/enable to preserve this interface invariant.

    J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ