lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 02 Mar 2012 08:54:00 +0000
From:	"Jan Beulich" <JBeulich@...e.com>
To:	"Alex Shi" <alex.shi@...el.com>
Cc:	<jeremy@...p.org>,
	"asit.k.mallick@...el.com" <asit.k.mallick@...el.com>,
	"x86@...nel.org" <x86@...nel.org>, <tglx@...utronix.de>,
	"Andi Kleen" <ak@...ux.intel.com>,
	"mingo@...hat.com" <mingo@...hat.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"hpa@...or.com" <hpa@...or.com>
Subject: Re: [RFC patch] cmpxchg_double: remove local variables to get
 better performance

>>> On 02.03.12 at 09:31, Alex Shi <alex.shi@...el.com> wrote:
> There are some local variables in cmpxchg_double macro, seems these are
> used to for force casting on input variables to transfer them into '*p1'
> type. May there are some reason I don't know. But I just saw 2 problems
> here:
> 
> 1, user may mis-use the macro, like give a 'long' type o1, but just use
> a 'int*' or 'char*' p1.  

No - see the BUILD_BUG_ON()s right after the lines you suggest to
remove.

Further, it seems to be intentional to allow _compatible_ types for
o1 and o2 - you could pass in a literal number without L suffix here,
which I don't think you can anymore with the intermediate variable
removed.

> If we remove the force cast here, gcc will check the mis-using in
> compiling. and user can get the error report in compiling for such
> issues.
> 
> 2, local variable increased the data section, and bring extra memory bus

These aren't static, so the data section can't possibly increase.

> accesses, that hurt performance in this critical macro.

With optimization enabled, the compiler should eliminate all unnecessary
intermediate variables.

> I did a little experiment on my nhm i7 desktop, to run the macro with a
> fixed times, here is the data:
> 			 using local vars         no local variable
> with lock prefix,         267700578ns             232079696ns
> without lock prefix,      34715666ns              34687566ns
> 
> So, we may need rethink about the local variable usage here. 
> 
> Signed-off-by: Alex Shi <alex.shi@...el.com>

Sorry, but if this counts, this is a nack from me.

Jan

> diff --git a/arch/x86/include/asm/cmpxchg.h b/arch/x86/include/asm/cmpxchg.h
> index b3b7332..8bf9127 100644
> --- a/arch/x86/include/asm/cmpxchg.h
> +++ b/arch/x86/include/asm/cmpxchg.h
> @@ -210,17 +210,15 @@ extern void __add_wrong_size(void)
>  #define __cmpxchg_double(pfx, p1, p2, o1, o2, n1, n2)			\
>  ({									\
>  	bool __ret;							\
> -	__typeof__(*(p1)) __old1 = (o1), __new1 = (n1);			\
> -	__typeof__(*(p2)) __old2 = (o2), __new2 = (n2);			\
>  	BUILD_BUG_ON(sizeof(*(p1)) != sizeof(long));			\
>  	BUILD_BUG_ON(sizeof(*(p2)) != sizeof(long));			\
>  	VM_BUG_ON((unsigned long)(p1) % (2 * sizeof(long)));		\
>  	VM_BUG_ON((unsigned long)((p1) + 1) != (unsigned long)(p2));	\
>  	asm volatile(pfx "cmpxchg%c4b %2; sete %0"			\
> -		     : "=a" (__ret), "+d" (__old2),			\
> +		     : "=a" (__ret), "+d" (o2),				\
>  		       "+m" (*(p1)), "+m" (*(p2))			\
> -		     : "i" (2 * sizeof(long)), "a" (__old1),		\
> -		       "b" (__new1), "c" (__new2));			\
> +		     : "i" (2 * sizeof(long)), "a" (o1),		\
> +		       "b" (n1), "c" (n2));				\
>  	__ret;								\
>  })
>  



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ