[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20101214163509.GB20667@Krystal>
Date: Tue, 14 Dec 2010 11:35:09 -0500
From: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To: Christoph Lameter <cl@...ux.com>
Cc: Tejun Heo <tj@...nel.org>, akpm@...ux-foundation.org,
Pekka Enberg <penberg@...helsinki.fi>,
linux-kernel@...r.kernel.org,
Eric Dumazet <eric.dumazet@...il.com>,
"H. Peter Anvin" <hpa@...or.com>
Subject: Re: [cpuops cmpxchg V2 5/5] cpuops: Use cmpxchg for xchg to avoid
lock semantics
* Christoph Lameter (cl@...ux.com) wrote:
> Use cmpxchg instead of xchg to realize this_cpu_xchg.
>
> xchg will cause LOCK overhead since LOCK is always implied but cmpxchg
> will not.
>
> Baselines:
>
> xchg() = 18 cycles (no segment prefix, LOCK semantics)
> __this_cpu_xchg = 1 cycle
>
> (simulated using this_cpu_read/write, two prefixes. Looks like the
> cpu can use loop optimization to get rid of most of the overhead)
>
> Cycles before:
>
> this_cpu_xchg = 37 cycles (segment prefix and LOCK (implied by xchg))
>
> After:
>
> this_cpu_xchg = 11 cycle (using cmpxchg without lock semantics)
Cool! Thanks for benchmarking these, it's really worth it.
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
>
> Signed-off-by: Christoph Lameter <cl@...ux.com>
>
> ---
> arch/x86/include/asm/percpu.h | 21 +++++++++++++++------
> 1 file changed, 15 insertions(+), 6 deletions(-)
>
> Index: linux-2.6/arch/x86/include/asm/percpu.h
> ===================================================================
> --- linux-2.6.orig/arch/x86/include/asm/percpu.h 2010-12-10 12:46:31.000000000 -0600
> +++ linux-2.6/arch/x86/include/asm/percpu.h 2010-12-10 13:25:21.000000000 -0600
> @@ -213,8 +213,9 @@ do { \
> })
>
> /*
> - * Beware: xchg on x86 has an implied lock prefix. There will be the cost of
> - * full lock semantics even though they are not needed.
> + * xchg is implemented using cmpxchg without a lock prefix. xchg is
> + * expensive due to the implied lock prefix. The processor cannot prefetch
> + * cachelines if xchg is used.
> */
> #define percpu_xchg_op(var, nval) \
> ({ \
> @@ -222,25 +223,33 @@ do { \
> typeof(var) __new = (nval); \
> switch (sizeof(var)) { \
> case 1: \
> - asm("xchgb %2, "__percpu_arg(1) \
> + asm("\n1:mov "__percpu_arg(1)",%%al" \
> + "\n\tcmpxchgb %2, "__percpu_arg(1) \
> + "\n\tjnz 1b" \
> : "=a" (__ret), "+m" (var) \
> : "q" (__new) \
> : "memory"); \
> break; \
> case 2: \
> - asm("xchgw %2, "__percpu_arg(1) \
> + asm("\n1:mov "__percpu_arg(1)",%%ax" \
> + "\n\tcmpxchgw %2, "__percpu_arg(1) \
> + "\n\tjnz 1b" \
> : "=a" (__ret), "+m" (var) \
> : "r" (__new) \
> : "memory"); \
> break; \
> case 4: \
> - asm("xchgl %2, "__percpu_arg(1) \
> + asm("\n1:mov "__percpu_arg(1)",%%eax" \
> + "\n\tcmpxchgl %2, "__percpu_arg(1) \
> + "\n\tjnz 1b" \
> : "=a" (__ret), "+m" (var) \
> : "r" (__new) \
> : "memory"); \
> break; \
> case 8: \
> - asm("xchgq %2, "__percpu_arg(1) \
> + asm("\n1:mov "__percpu_arg(1)",%%rax" \
> + "\n\tcmpxchgq %2, "__percpu_arg(1) \
> + "\n\tjnz 1b" \
> : "=a" (__ret), "+m" (var) \
> : "r" (__new) \
> : "memory"); \
>
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists