linux-kernel - Re: [PATCH V3] powerpc: Implement {cmp}xchg for u8 and u16

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160421155257.GA20657@insomnia>
Date:	Thu, 21 Apr 2016 23:52:57 +0800
From:	Boqun Feng <boqun.feng@...il.com>
To:	Pan Xinhui <xinhui@...ux.vnet.ibm.com>
Cc:	Peter Zijlstra <peterz@...radead.org>,
	linux-kernel@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org,
	benh@...nel.crashing.org, paulus@...ba.org, mpe@...erman.id.au,
	paulmck@...ux.vnet.ibm.com, tglx@...utronix.de
Subject: Re: [PATCH V3] powerpc: Implement {cmp}xchg for u8 and u16

On Thu, Apr 21, 2016 at 11:35:07PM +0800, Pan Xinhui wrote:
> On 2016年04月20日 22:24, Peter Zijlstra wrote:
> > On Wed, Apr 20, 2016 at 09:24:00PM +0800, Pan Xinhui wrote:
> > 
> >> +#define __XCHG_GEN(cmp, type, sfx, skip, v)				\
> >> +static __always_inline unsigned long					\
> >> +__cmpxchg_u32##sfx(v unsigned int *p, unsigned long old,		\
> >> +			 unsigned long new);				\
> >> +static __always_inline u32						\
> >> +__##cmp##xchg_##type##sfx(v void *ptr, u32 old, u32 new)		\
> >> +{									\
> >> +	int size = sizeof (type);					\
> >> +	int off = (unsigned long)ptr % sizeof(u32);			\
> >> +	volatile u32 *p = ptr - off;					\
> >> +	int bitoff = BITOFF_CAL(size, off);				\
> >> +	u32 bitmask = ((0x1 << size * BITS_PER_BYTE) - 1) << bitoff;	\
> >> +	u32 oldv, newv, tmp;						\
> >> +	u32 ret;							\
> >> +	oldv = READ_ONCE(*p);						\
> >> +	do {								\
> >> +		ret = (oldv & bitmask) >> bitoff;			\
> >> +		if (skip && ret != old)					\
> >> +			break;						\
> >> +		newv = (oldv & ~bitmask) | (new << bitoff);		\
> >> +		tmp = oldv;						\
> >> +		oldv = __cmpxchg_u32##sfx((v u32*)p, oldv, newv);	\
> >> +	} while (tmp != oldv);						\
> >> +	return ret;							\
> >> +}
> > 
> > So for an LL/SC based arch using cmpxchg() like that is sub-optimal.
> > 
> > Why did you choose to write it entirely in C?
> > 
> yes, you are right. more load/store will be done in C code.
> However such xchg_u8/u16 is just used by qspinlock now. and I did not see any performance regression.
> So just wrote in C, for simple. :)
> 
> Of course I have done xchg tests.
> we run code just like xchg((u8*)&v, j++); in several threads.
> and the result is,
> [  768.374264] use time[1550072]ns in xchg_u8_asm

How was xchg_u8_asm() implemented, using lbarx or using a 32bit ll/sc
loop with shifting and masking in it?

Regards,
Boqun

> [  768.377102] use time[2826802]ns in xchg_u8_c
> 
> I think this is because there is one more load in C.
> If possible, we can move such code in asm-generic/.
> 
> thanks
> xinhui
> 

Download attachment "signature.asc" of type "application/pgp-signature" (474 bytes)