[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090115113045.GG22850@elte.hu>
Date: Thu, 15 Jan 2009 12:30:45 +0100
From: Ingo Molnar <mingo@...e.hu>
To: Tejun Heo <tj@...nel.org>
Cc: "H. Peter Anvin" <hpa@...or.com>, Brian Gerst <brgerst@...il.com>,
ebiederm@...ssion.com, cl@...ux-foundation.org,
rusty@...tcorp.com.au, travis@....com,
linux-kernel@...r.kernel.org, akpm@...ux-foundation.org,
steiner@....com, hugh@...itas.com
Subject: Re: [patch] add optimized generic percpu accessors
* Tejun Heo <tj@...nel.org> wrote:
> Hello, Ingo.
>
> Ingo Molnar wrote:
> > Tejun, could you please also add the patch below to your lineup too?
>
> Sure thing.
>
> > It is an optimization and a cleanup, and adds the following new generic
> > percpu methods:
> >
> > percpu_read()
> > percpu_write()
> > percpu_add()
> > percpu_sub()
> > percpu_or()
> > percpu_xor()
> >
> > and implements support for them on x86. (other architectures will fall
> > back to a default implementation)
> >
> > The advantage is that for example to read a local percpu variable, instead
> > of this sequence:
> >
> > return __get_cpu_var(var);
> >
> > ffffffff8102ca2b: 48 8b 14 fd 80 09 74 mov -0x7e8bf680(,%rdi,8),%rdx
> > ffffffff8102ca32: 81
> > ffffffff8102ca33: 48 c7 c0 d8 59 00 00 mov $0x59d8,%rax
> > ffffffff8102ca3a: 48 8b 04 10 mov (%rax,%rdx,1),%rax
> >
> > We can get a single instruction by using the optimized variants:
> >
> > return percpu_read(var);
> >
> > ffffffff8102ca3f: 65 48 8b 05 91 8f fd mov %gs:0x7efd8f91(%rip),%rax
> >
> > I also cleaned up the x86-specific APIs and made the x86 code use these
> > new generic percpu primitives.
> >
> > It looks quite hard to convince the compiler to generate the optimized
> > single-instruction sequence for us out of __get_cpu_var(var) - or can you
> > perhaps see a way to do it?
>
> Yeah, I thought about that too but couldn't think of a way to persuade
> the compiler because the compiler doesn't know how to access the
> address. I'll play with it a bit more but the clumsy percpu_*()
> accessors probably might be the only way. :-(
the new ops are a pretty nice and clean solution i think.
Firstly, accessing the current CPU is the only safe shortcut anyway (there
is where we can do %fs/%gs / rip-relative addressing modes), and the
generic per_cpu() APIs dont really provide that guarantee for us. We might
be able to hook into __get_cpu_var() but those both require to be an
lvalue and are also relatively rarely used.
So introducing the new, rather straightforward APIs and using them
wherever they matter for performance is good. Your patchset already shaved
off an instruction from ordinary per_cpu() accesses, so it's all moving
rather close to the most-optimal situation already.
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists