lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6427b071-40de-cad5-d1e5-e45f84ae837a@gentwo.org>
Date: Fri, 23 Aug 2024 10:56:12 -0700 (PDT)
From: "Christoph Lameter (Ampere)" <cl@...two.org>
To: Will Deacon <will@...nel.org>
cc: Catalin Marinas <catalin.marinas@....com>, 
    Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>, 
    Waiman Long <longman@...hat.com>, Boqun Feng <boqun.feng@...il.com>, 
    Linus Torvalds <torvalds@...ux-foundation.org>, linux-mm@...ck.org, 
    linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org, 
    linux-arch@...r.kernel.org
Subject: Re: [PATCH v2] Avoid memory barrier in read_seqcount() through load
 acquire

On Fri, 23 Aug 2024, Will Deacon wrote:

> On Mon, Aug 19, 2024 at 11:30:15AM -0700, Christoph Lameter via B4 Relay wrote:
> > +static __always_inline unsigned						\
> > +__seqprop_##lockname##_sequence_acquire(const seqcount_##lockname##_t *s) \
> > +{									\
> > +	unsigned seq = smp_load_acquire(&s->seqcount.sequence);		\
> > +									\
> > +	if (!IS_ENABLED(CONFIG_PREEMPT_RT))				\
> > +		return seq;						\
> > +									\
> > +	if (preemptible && unlikely(seq & 1)) {				\
> > +		__SEQ_LOCK(lockbase##_lock(s->lock));			\
> > +		__SEQ_LOCK(lockbase##_unlock(s->lock));			\
> > +									\
> > +		/*							\
> > +		 * Re-read the sequence counter since the (possibly	\
> > +		 * preempted) writer made progress.			\
> > +		 */							\
> > +		seq = smp_load_acquire(&s->seqcount.sequence);		\
>
> We could probably do even better with LDAPR here, as that should be
> sufficient for this. It's a can of worms though, as it's not implemented
> on all CPUs and relaxing smp_load_acquire() might introduce subtle
> breakage in places where it's used to build other types of lock. Maybe
> you can hack something to see if there's any performance left behind
> without it?

I added the following patch. Kernel booted fine. No change in the cycles
of read_seq()

LDAPR
---------------------------
Test					Single	2 CPU	4 CPU	8 CPU	16 CPU	32 CPU	64 CPU	ALL
write seq			:	13	98	385	764	1551	3043	6259	11922
read seq			:	8	8	8	8	8	8	9	10
rw seq				:	8	101	247	300	467	742	1384	2101

LDA
---------------------------
Test                                     Single  2 CPU   4 CPU   8 CPU   16 CPU  32 CPU  64 CPU  ALL
write seq                        :       13      90      343     785     1533    3032    6315    11073
read seq                         :       8       8       8       8       8       8       9       11
rw seq                           :       8       79      227     313     423     755     1313    2220





Index: linux/arch/arm64/include/asm/barrier.h
===================================================================
--- linux.orig/arch/arm64/include/asm/barrier.h
+++ linux/arch/arm64/include/asm/barrier.h
@@ -167,22 +167,22 @@ do {									\
 	kasan_check_read(__p, sizeof(*p));				\
 	switch (sizeof(*p)) {						\
 	case 1:								\
-		asm volatile ("ldarb %w0, %1"				\
+		asm volatile (".arch_extension rcpc\nldaprb %w0, %1"	\
 			: "=r" (*(__u8 *)__u.__c)			\
 			: "Q" (*__p) : "memory");			\
 		break;							\
 	case 2:								\
-		asm volatile ("ldarh %w0, %1"				\
+		asm volatile (".arch_extension rcpc\nldaprh %w0, %1"				\
 			: "=r" (*(__u16 *)__u.__c)			\
 			: "Q" (*__p) : "memory");			\
 		break;							\
 	case 4:								\
-		asm volatile ("ldar %w0, %1"				\
+		asm volatile (".arch_extension rcpc\nldapr %w0, %1"				\
 			: "=r" (*(__u32 *)__u.__c)			\
 			: "Q" (*__p) : "memory");			\
 		break;							\
 	case 8:								\
-		asm volatile ("ldar %0, %1"				\
+		asm volatile (".arch_extension rcpc\nldapr %0, %1"				\
 			: "=r" (*(__u64 *)__u.__c)			\
 			: "Q" (*__p) : "memory");			\
 		break;							\

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ