lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 15 Jan 2009 12:30:45 +0100
From:	Ingo Molnar <mingo@...e.hu>
To:	Tejun Heo <tj@...nel.org>
Cc:	"H. Peter Anvin" <hpa@...or.com>, Brian Gerst <brgerst@...il.com>,
	ebiederm@...ssion.com, cl@...ux-foundation.org,
	rusty@...tcorp.com.au, travis@....com,
	linux-kernel@...r.kernel.org, akpm@...ux-foundation.org,
	steiner@....com, hugh@...itas.com
Subject: Re: [patch] add optimized generic percpu accessors


* Tejun Heo <tj@...nel.org> wrote:

> Hello, Ingo.
> 
> Ingo Molnar wrote:
> > Tejun, could you please also add the patch below to your lineup too?
> 
> Sure thing.
> 
> > It is an optimization and a cleanup, and adds the following new generic 
> > percpu methods:
> > 
> >   percpu_read()
> >   percpu_write()
> >   percpu_add()
> >   percpu_sub()
> >   percpu_or() 
> >   percpu_xor()
> > 
> > and implements support for them on x86. (other architectures will fall 
> > back to a default implementation)
> > 
> > The advantage is that for example to read a local percpu variable, instead 
> > of this sequence:
> > 
> >  return __get_cpu_var(var);
> > 
> >  ffffffff8102ca2b:	48 8b 14 fd 80 09 74 	mov    -0x7e8bf680(,%rdi,8),%rdx
> >  ffffffff8102ca32:	81 
> >  ffffffff8102ca33:	48 c7 c0 d8 59 00 00 	mov    $0x59d8,%rax
> >  ffffffff8102ca3a:	48 8b 04 10          	mov    (%rax,%rdx,1),%rax
> > 
> > We can get a single instruction by using the optimized variants:
> > 
> >  return percpu_read(var);
> > 
> >  ffffffff8102ca3f:	65 48 8b 05 91 8f fd 	mov    %gs:0x7efd8f91(%rip),%rax
> > 
> > I also cleaned up the x86-specific APIs and made the x86 code use these 
> > new generic percpu primitives.
> > 
> > It looks quite hard to convince the compiler to generate the optimized 
> > single-instruction sequence for us out of __get_cpu_var(var) - or can you 
> > perhaps see a way to do it?
> 
> Yeah, I thought about that too but couldn't think of a way to persuade 
> the compiler because the compiler doesn't know how to access the 
> address.  I'll play with it a bit more but the clumsy percpu_*() 
> accessors probably might be the only way.  :-(

the new ops are a pretty nice and clean solution i think.

Firstly, accessing the current CPU is the only safe shortcut anyway (there 
is where we can do %fs/%gs / rip-relative addressing modes), and the 
generic per_cpu() APIs dont really provide that guarantee for us. We might 
be able to hook into __get_cpu_var() but those both require to be an 
lvalue and are also relatively rarely used.

So introducing the new, rather straightforward APIs and using them 
wherever they matter for performance is good. Your patchset already shaved 
off an instruction from ordinary per_cpu() accesses, so it's all moving 
rather close to the most-optimal situation already.

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ