[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6427b071-40de-cad5-d1e5-e45f84ae837a@gentwo.org>
Date: Fri, 23 Aug 2024 10:56:12 -0700 (PDT)
From: "Christoph Lameter (Ampere)" <cl@...two.org>
To: Will Deacon <will@...nel.org>
cc: Catalin Marinas <catalin.marinas@....com>,
Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>,
Waiman Long <longman@...hat.com>, Boqun Feng <boqun.feng@...il.com>,
Linus Torvalds <torvalds@...ux-foundation.org>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
linux-arch@...r.kernel.org
Subject: Re: [PATCH v2] Avoid memory barrier in read_seqcount() through load
acquire
On Fri, 23 Aug 2024, Will Deacon wrote:
> On Mon, Aug 19, 2024 at 11:30:15AM -0700, Christoph Lameter via B4 Relay wrote:
> > +static __always_inline unsigned \
> > +__seqprop_##lockname##_sequence_acquire(const seqcount_##lockname##_t *s) \
> > +{ \
> > + unsigned seq = smp_load_acquire(&s->seqcount.sequence); \
> > + \
> > + if (!IS_ENABLED(CONFIG_PREEMPT_RT)) \
> > + return seq; \
> > + \
> > + if (preemptible && unlikely(seq & 1)) { \
> > + __SEQ_LOCK(lockbase##_lock(s->lock)); \
> > + __SEQ_LOCK(lockbase##_unlock(s->lock)); \
> > + \
> > + /* \
> > + * Re-read the sequence counter since the (possibly \
> > + * preempted) writer made progress. \
> > + */ \
> > + seq = smp_load_acquire(&s->seqcount.sequence); \
>
> We could probably do even better with LDAPR here, as that should be
> sufficient for this. It's a can of worms though, as it's not implemented
> on all CPUs and relaxing smp_load_acquire() might introduce subtle
> breakage in places where it's used to build other types of lock. Maybe
> you can hack something to see if there's any performance left behind
> without it?
I added the following patch. Kernel booted fine. No change in the cycles
of read_seq()
LDAPR
---------------------------
Test Single 2 CPU 4 CPU 8 CPU 16 CPU 32 CPU 64 CPU ALL
write seq : 13 98 385 764 1551 3043 6259 11922
read seq : 8 8 8 8 8 8 9 10
rw seq : 8 101 247 300 467 742 1384 2101
LDA
---------------------------
Test Single 2 CPU 4 CPU 8 CPU 16 CPU 32 CPU 64 CPU ALL
write seq : 13 90 343 785 1533 3032 6315 11073
read seq : 8 8 8 8 8 8 9 11
rw seq : 8 79 227 313 423 755 1313 2220
Index: linux/arch/arm64/include/asm/barrier.h
===================================================================
--- linux.orig/arch/arm64/include/asm/barrier.h
+++ linux/arch/arm64/include/asm/barrier.h
@@ -167,22 +167,22 @@ do { \
kasan_check_read(__p, sizeof(*p)); \
switch (sizeof(*p)) { \
case 1: \
- asm volatile ("ldarb %w0, %1" \
+ asm volatile (".arch_extension rcpc\nldaprb %w0, %1" \
: "=r" (*(__u8 *)__u.__c) \
: "Q" (*__p) : "memory"); \
break; \
case 2: \
- asm volatile ("ldarh %w0, %1" \
+ asm volatile (".arch_extension rcpc\nldaprh %w0, %1" \
: "=r" (*(__u16 *)__u.__c) \
: "Q" (*__p) : "memory"); \
break; \
case 4: \
- asm volatile ("ldar %w0, %1" \
+ asm volatile (".arch_extension rcpc\nldapr %w0, %1" \
: "=r" (*(__u32 *)__u.__c) \
: "Q" (*__p) : "memory"); \
break; \
case 8: \
- asm volatile ("ldar %0, %1" \
+ asm volatile (".arch_extension rcpc\nldapr %0, %1" \
: "=r" (*(__u64 *)__u.__c) \
: "Q" (*__p) : "memory"); \
break; \
Powered by blists - more mailing lists