linux-kernel - Re: [patch] add optimized generic percpu accessors

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20090115113045.GG22850@elte.hu>
Date:	Thu, 15 Jan 2009 12:30:45 +0100
From:	Ingo Molnar <mingo@...e.hu>
To:	Tejun Heo <tj@...nel.org>
Cc:	"H. Peter Anvin" <hpa@...or.com>, Brian Gerst <brgerst@...il.com>,
	ebiederm@...ssion.com, cl@...ux-foundation.org,
	rusty@...tcorp.com.au, travis@....com,
	linux-kernel@...r.kernel.org, akpm@...ux-foundation.org,
	steiner@....com, hugh@...itas.com
Subject: Re: [patch] add optimized generic percpu accessors


* Tejun Heo <tj@...nel.org> wrote:

> Hello, Ingo.
> 
> Ingo Molnar wrote:
> > Tejun, could you please also add the patch below to your lineup too?
> 
> Sure thing.
> 
> > It is an optimization and a cleanup, and adds the following new generic 
> > percpu methods:
> > 
> >   percpu_read()
> >   percpu_write()
> >   percpu_add()
> >   percpu_sub()
> >   percpu_or() 
> >   percpu_xor()
> > 
> > and implements support for them on x86. (other architectures will fall 
> > back to a default implementation)
> > 
> > The advantage is that for example to read a local percpu variable, instead 
> > of this sequence:
> > 
> >  return __get_cpu_var(var);
> > 
> >  ffffffff8102ca2b:	48 8b 14 fd 80 09 74 	mov    -0x7e8bf680(,%rdi,8),%rdx
> >  ffffffff8102ca32:	81 
> >  ffffffff8102ca33:	48 c7 c0 d8 59 00 00 	mov    $0x59d8,%rax
> >  ffffffff8102ca3a:	48 8b 04 10          	mov    (%rax,%rdx,1),%rax
> > 
> > We can get a single instruction by using the optimized variants:
> > 
> >  return percpu_read(var);
> > 
> >  ffffffff8102ca3f:	65 48 8b 05 91 8f fd 	mov    %gs:0x7efd8f91(%rip),%rax
> > 
> > I also cleaned up the x86-specific APIs and made the x86 code use these 
> > new generic percpu primitives.
> > 
> > It looks quite hard to convince the compiler to generate the optimized 
> > single-instruction sequence for us out of __get_cpu_var(var) - or can you 
> > perhaps see a way to do it?
> 
> Yeah, I thought about that too but couldn't think of a way to persuade 
> the compiler because the compiler doesn't know how to access the 
> address.  I'll play with it a bit more but the clumsy percpu_*() 
> accessors probably might be the only way.  :-(

the new ops are a pretty nice and clean solution i think.

Firstly, accessing the current CPU is the only safe shortcut anyway (there 
is where we can do %fs/%gs / rip-relative addressing modes), and the 
generic per_cpu() APIs dont really provide that guarantee for us. We might 
be able to hook into __get_cpu_var() but those both require to be an 
lvalue and are also relatively rarely used.

So introducing the new, rather straightforward APIs and using them 
wherever they matter for performance is good. Your patchset already shaved 
off an instruction from ordinary per_cpu() accesses, so it's all moving 
rather close to the most-optimal situation already.

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/