linux-kernel - Re: [PATCH 5/6] ARCv2: spinlock/rwlock/atomics: Delayed retry of failed SCOND with exponential backoff

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <C2D7FE5348E1B147BCA15975FBA23075665B0A01@IN01WEMBXB.internal.synopsys.com>
Date:	Mon, 3 Aug 2015 13:01:56 +0000
From:	Vineet Gupta <Vineet.Gupta1@...opsys.com>
To:	Peter Zijlstra <peterz@...radead.org>
CC:	lkml <linux-kernel@...r.kernel.org>,
	"arc-linux-dev@...opsys.com" <arc-linux-dev@...opsys.com>
Subject: Re: [PATCH 5/6] ARCv2: spinlock/rwlock/atomics: Delayed retry of
 failed SCOND with exponential backoff

On Monday 03 August 2015 05:11 PM, Peter Zijlstra wrote:
> On Mon, Aug 03, 2015 at 03:33:07PM +0530, Vineet Gupta wrote:
>> +#define SCOND_FAIL_RETRY_VAR_DEF						\
>> +	unsigned int delay = 1, tmp;						\
>> +
>> +#define SCOND_FAIL_RETRY_ASM							\
>> +	"	bz	4f			\n"				\
>> +	"   ; --- scond fail delay ---		\n"				\
>> +	"	mov	%[tmp], %[delay]	\n"	/* tmp = delay */	\
>> +	"2: 	brne.d	%[tmp], 0, 2b		\n"	/* while (tmp != 0) */	\
>> +	"	sub	%[tmp], %[tmp], 1	\n"	/* tmp-- */		\
>> +	"	asl	%[delay], %[delay], 1	\n"	/* delay *= 2 */	\
>> +	"	b	1b			\n"	/* start over */	\
>> +	"4: ; --- success ---			\n"				\
>> +
>> +#define SCOND_FAIL_RETRY_VARS							\
>> +	  ,[delay] "+&r" (delay),[tmp] "=&r"	(tmp)				\
>> +
>> +#define ATOMIC_OP(op, c_op, asm_op)					\
>> +static inline void atomic_##op(int i, atomic_t *v)			\
>> +{									\
>> +	unsigned int val, delay = 1, tmp;				\
> Maybe use your SCOND_FAIL_RETRY_VAR_DEF ?

Right - not sure how I missed that !

>
>> +									\
>> +	__asm__ __volatile__(						\
>> +	"1:	llock   %[val], [%[ctr]]		\n"		\
>> +	"	" #asm_op " %[val], %[val], %[i]	\n"		\
>> +	"	scond   %[val], [%[ctr]]		\n"		\
>> +	"						\n"		\
>> +	SCOND_FAIL_RETRY_ASM						\
>> +									\
>> +	: [val]	"=&r"	(val) /* Early clobber to prevent reg reuse */	\
>> +	  SCOND_FAIL_RETRY_VARS						\
>> +	: [ctr]	"r"	(&v->counter), /* Not "m": llock only supports reg direct addr mode */	\
>> +	  [i]	"ir"	(i)						\
>> +	: "cc");							\
>> +}									\
>> +
>> +#define ATOMIC_OP_RETURN(op, c_op, asm_op)				\
>> +static inline int atomic_##op##_return(int i, atomic_t *v)		\
>> +{									\
>> +	unsigned int val, delay = 1, tmp;				\
> Idem.

OK !

>> +									\
>> +	/*								\
>> +	 * Explicit full memory barrier needed before/after as		\
>> +	 * LLOCK/SCOND thmeselves don't provide any such semantics	\
>> +	 */								\
>> +	smp_mb();							\
>> +									\
>> +	__asm__ __volatile__(						\
>> +	"1:	llock   %[val], [%[ctr]]		\n"		\
>> +	"	" #asm_op " %[val], %[val], %[i]	\n"		\
>> +	"	scond   %[val], [%[ctr]]		\n"		\
>> +	"						\n"		\
>> +	SCOND_FAIL_RETRY_ASM						\
>> +									\
>> +	: [val]	"=&r"	(val)						\
>> +	  SCOND_FAIL_RETRY_VARS						\
>> +	: [ctr]	"r"	(&v->counter),					\
>> +	  [i]	"ir"	(i)						\
>> +	: "cc");							\
>> +									\
>> +	smp_mb();							\
>> +									\
>> +	return val;							\
>> +}
>> +#define SCOND_FAIL_RETRY_VAR_DEF						\
>> +	unsigned int delay, tmp;						\
>> +
>> +#define SCOND_FAIL_RETRY_ASM							\
>> +	"   ; --- scond fail delay ---		\n"				\
>> +	"	mov	%[tmp], %[delay]	\n"	/* tmp = delay */	\
>> +	"2: 	brne.d	%[tmp], 0, 2b		\n"	/* while (tmp != 0) */	\
>> +	"	sub	%[tmp], %[tmp], 1	\n"	/* tmp-- */		\
>> +	"	asl	%[delay], %[delay], 1	\n"	/* delay *= 2 */	\
>> +	"	b	1b			\n"	/* start over */	\
>> +	"					\n"				\
>> +	"4: ; --- done ---			\n"				\
>> +
>> +#define SCOND_FAIL_RETRY_VARS							\
>> +	  ,[delay] "=&r" (delay), [tmp] "=&r"	(tmp)				\
> This is looking remarkably similar to the previous ones, why not a
> shared header?

I thought about it when duplicating the code - however it seemed that readability
was better if code was present in same file, rather than having to look up in a
different header with no context at all.

Plus there are some subtle differences in two when looked closely. Basically
spinlocks need the reset to 1 quirk which atomics don't which means we need the
delay reset to 1 in spinlock inline asm (and a different inline asm constraint).
Plus for atomics, the success branch (bz 4f) is folded away into the macro while
we can't for lock try routines, as that branch uses a delay slot. Agreed that all
of this is in the micro-optim realm, but I suppose worth when u have a 10 stage
pipeline.


>> +static inline void arch_spin_lock(arch_spinlock_t *lock)
>> +{
>> +	unsigned int val;
>> +	SCOND_FAIL_RETRY_VAR_DEF;
>> +
>> +	smp_mb();
>> +
>> +	__asm__ __volatile__(
>> +	"0:	mov	%[delay], 1		\n"
>> +	"1:	llock	%[val], [%[slock]]	\n"
>> +	"	breq	%[val], %[LOCKED], 1b	\n"	/* spin while LOCKED */
>> +	"	scond	%[LOCKED], [%[slock]]	\n"	/* acquire */
>> +	"	bz	4f			\n"	/* done */
>> +	"					\n"
>> +	SCOND_FAIL_RETRY_ASM
> But,... in the case that macro is empty, the label 4 does not actually
> exist. I see no real reason for this to be different from the previous
> incarnation either.

Per current code, the macro is never empty. I initially wrote it to have one
version of routines with different macro definition but then it was getting
terribly difficult to follow so I resorted to duplicating all the routines, with
macros to kind of compensate for duplication by factoring out common code in
duplicated code :-)

for locks, I can again fold the the bz into macro, but then we can't use the delay
slot in try versions !
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/