[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAGQJe6oA8oiZmg0HGW7KrjdOJ7nXNK9hQFm4jdQ6n0WTG6i-vA@mail.gmail.com>
Date: Sat, 1 Feb 2025 13:04:25 +0100
From: Aleksandar Rikalo <arikalo@...il.com>
To: Charlie Jenkins <charlie@...osinc.com>
Cc: linux-riscv@...ts.infradead.org, Paul Walmsley <paul.walmsley@...ive.com>,
Palmer Dabbelt <palmer@...belt.com>, Albert Ou <aou@...s.berkeley.edu>,
Will Deacon <will@...nel.org>, Peter Zijlstra <peterz@...radead.org>,
Boqun Feng <boqun.feng@...il.com>, Mark Rutland <mark.rutland@....com>,
Yury Norov <yury.norov@...il.com>, Rasmus Villemoes <linux@...musvillemoes.dk>,
Andrea Parri <parri.andrea@...il.com>, Leonardo Bras <leobras@...hat.com>, Guo Ren <guoren@...nel.org>,
Samuel Holland <samuel.holland@...ive.com>, Eric Chan <ericchancf@...gle.com>,
linux-kernel@...r.kernel.org,
Djordje Todorovic <djordje.todorovic@...cgroup.com>
Subject: Re: [PATCH v2] riscv: Use Zalrsc extension to implement atomic functions
On Fri, Jan 10, 2025 at 4:23 AM Charlie Jenkins <charlie@...osinc.com> wrote:
> > From: Chao-ying Fu <cfu@...s.com>
> >
> > Use only LR/SC instructions to implement atomic functions.
>
> In the previous patch you mention that this is to support MIPS P8700. Can
> you expand on why this change is required? The datasheet at [1] says:
>
> "The P8700 core is configured to support the RV64GCZba_Zbb (G = IMAFD)
> Standard ISA. It includes the RV64I base ISA, Multiply (M), Atomic (A),
> Single-Precision Floating Point (F), Double (D), Compressed (C) RISC-V
> extensions, as well as the as well as the bit-manipulation extensions
> (Zba) and (Zbb)"
>
> The "A" extension is a part of "G" which is mostly assumed to exist in
> the kernel. Additionally, having this be a compilation flag will cause
> traps on generic kernels. We generally try to push everything we can
> into runtime feature detection since there are so many possible variants
> of riscv.
>
> Expressing not being able to perform a feature like this is normally
> better expressed as an errata. Then generic kernels will be able to
> include this, and anybody who doesn't want to have the extra nops
> introduced can disable the errata. A similar approach to what I pointed
> out last time should work here too (but with more places to replace)
> [2].
>
> [1] https://mips.com/wp-content/uploads/2024/11/P8700_Data_Sheet.pdf
> [2] https://lore.kernel.org/lkml/Z2-UNfwcAQYZqVBU@ghost/T/
So far we haven't found a way to do this using errata. There's no way
to patch one instruction with multiple ones, and we also need an extra
(temporary) register.
A CPU can implement the Zalrsc extension [1] instead of "complete A"
(which P8700 does).
>From "Zaamo and Zalrsc Extensions" spec:
"The fetch-and-op style atomic primitives provided by the A extension
support better scaling for highly parallel systems. Simpler
implementations that do not have such scaling requirements may
implement the Zalrsc subset of the A extension to support atomic
primitives."
Therefore, we believe this would be a useful option for wider use.
[1] https://github.com/riscv/riscv-zaamo-zalrsc/releases/download/v1.0-rc2/riscv-zaamo-zalrsc.pdf
> >
> > Add config RISCV_AMO_USE_ZALRSC.
> >
> > Signed-off-by: Chao-ying Fu <cfu@...s.com>
> > Signed-off-by: Aleksandar Rikalo <arikalo@...il.com>
> > ---
> > arch/riscv/Kconfig | 11 +++++++
> > arch/riscv/include/asm/atomic.h | 52 +++++++++++++++++++++++++++++++-
> > arch/riscv/include/asm/bitops.h | 45 +++++++++++++++++++++++++++
> > arch/riscv/include/asm/cmpxchg.h | 16 ++++++++++
> > arch/riscv/include/asm/futex.h | 46 ++++++++++++++++++++++++++++
> > 5 files changed, 169 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > index cc63aef41e94..9fb020b49408 100644
> > --- a/arch/riscv/Kconfig
> > +++ b/arch/riscv/Kconfig
> > @@ -715,6 +715,17 @@ config RISCV_ISA_ZACAS
> >
> > If you don't know what to do here, say Y.
> >
> > +config RISCV_AMO_USE_ZALRSC
> > + bool "Use Zalrsc extension to implement atomic functions"
> > + help
> > + Kernel uses only LR/SC instructions to implement atomic functions.
> > +
> > + It makes sense to enable this option if your platform only
> > + implements the Zalrsc extension (a subset of the A extension),
> > + and not the complete A extension.
> > +
> > + If you don't know what to do here, say N.
> > +
> > config TOOLCHAIN_HAS_ZBB
> > bool
> > default y
> > diff --git a/arch/riscv/include/asm/atomic.h b/arch/riscv/include/asm/atomic.h
> > index 5b96c2f61adb..88f62e33a545 100644
> > --- a/arch/riscv/include/asm/atomic.h
> > +++ b/arch/riscv/include/asm/atomic.h
> > @@ -50,6 +50,7 @@ static __always_inline void arch_atomic64_set(atomic64_t *v, s64 i)
> > * have the AQ or RL bits set. These don't return anything, so there's only
> > * one version to worry about.
> > */
> > +#ifndef RISCV_AMO_USE_ZALRSC
> > #define ATOMIC_OP(op, asm_op, I, asm_type, c_type, prefix) \
> > static __always_inline \
> > void arch_atomic##prefix##_##op(c_type i, atomic##prefix##_t *v) \
> > @@ -59,7 +60,23 @@ void arch_atomic##prefix##_##op(c_type i, atomic##prefix##_t *v) \
> > : "+A" (v->counter) \
> > : "r" (I) \
> > : "memory"); \
> > -} \
> > +}
> > +#else
> > +#define ATOMIC_OP(op, asm_op, I, asm_type, c_type, prefix) \
> > +static __always_inline \
> > +void arch_atomic##prefix##_##op(c_type i, atomic##prefix##_t *v) \
> > +{ \
> > + register c_type ret, temp; \
> > + __asm__ __volatile__ ( \
> > + "1: lr." #asm_type " %1, %0\n" \
> > + " " #asm_op " %2, %1, %3\n" \
> > + " sc." #asm_type " %2, %2, %0\n" \
> > + " bnez %2, 1b\n" \
> > + : "+A" (v->counter), "=&r" (ret), "=&r" (temp) \
> > + : "r" (I) \
> > + : "memory"); \
> > +}
> > +#endif
> >
> > #ifdef CONFIG_GENERIC_ATOMIC64
> > #define ATOMIC_OPS(op, asm_op, I) \
> > @@ -84,6 +101,7 @@ ATOMIC_OPS(xor, xor, i)
> > * There's two flavors of these: the arithmatic ops have both fetch and return
> > * versions, while the logical ops only have fetch versions.
> > */
> > +#ifndef RISCV_AMO_USE_ZALRSC
> > #define ATOMIC_FETCH_OP(op, asm_op, I, asm_type, c_type, prefix) \
> > static __always_inline \
> > c_type arch_atomic##prefix##_fetch_##op##_relaxed(c_type i, \
> > @@ -108,6 +126,38 @@ c_type arch_atomic##prefix##_fetch_##op(c_type i, atomic##prefix##_t *v) \
> > : "memory"); \
> > return ret; \
> > }
> > +#else
> > +#define ATOMIC_FETCH_OP(op, asm_op, I, asm_type, c_type, prefix) \
> > +static __always_inline \
> > +c_type arch_atomic##prefix##_fetch_##op##_relaxed(c_type i, \
> > + atomic##prefix##_t *v) \
> > +{ \
> > + register c_type ret, temp; \
> > + __asm__ __volatile__ ( \
> > + "1: lr." #asm_type " %1, %0\n" \
> > + " " #asm_op " %2, %1, %3\n" \
> > + " sc." #asm_type " %2, %2, %0\n" \
> > + " bnez %2, 1b\n" \
> > + : "+A" (v->counter), "=&r" (ret), "=&r" (temp) \
> > + : "r" (I) \
> > + : "memory"); \
> > + return ret; \
> > +} \
> > +static __always_inline \
> > +c_type arch_atomic##prefix##_fetch_##op(c_type i, atomic##prefix##_t *v) \
> > +{ \
> > + register c_type ret, temp; \
> > + __asm__ __volatile__ ( \
> > + "1: lr." #asm_type ".aqrl %1, %0\n" \
> > + " " #asm_op " %2, %1, %3\n" \
> > + " sc." #asm_type ".aqrl %2, %2, %0\n" \
> > + " bnez %2, 1b\n" \
> > + : "+A" (v->counter), "=&r" (ret), "=&r" (temp) \
> > + : "r" (I) \
> > + : "memory"); \
> > + return ret; \
> > +}
> > +#endif
> >
> > #define ATOMIC_OP_RETURN(op, asm_op, c_op, I, asm_type, c_type, prefix) \
> > static __always_inline \
> > diff --git a/arch/riscv/include/asm/bitops.h b/arch/riscv/include/asm/bitops.h
> > index fae152ea0508..0051de1cf471 100644
> > --- a/arch/riscv/include/asm/bitops.h
> > +++ b/arch/riscv/include/asm/bitops.h
> > @@ -187,12 +187,17 @@ static __always_inline int variable_fls(unsigned int x)
> >
> > #if (BITS_PER_LONG == 64)
> > #define __AMO(op) "amo" #op ".d"
> > +#define __LR "lr.d"
> > +#define __SC "sc.d"
> > #elif (BITS_PER_LONG == 32)
> > #define __AMO(op) "amo" #op ".w"
> > +#define __LR "lr.w"
> > +#define __SC "sc.w"
> > #else
> > #error "Unexpected BITS_PER_LONG"
> > #endif
> >
> > +#ifndef RISCV_AMO_USE_ZALRSC
> > #define __test_and_op_bit_ord(op, mod, nr, addr, ord) \
> > ({ \
> > unsigned long __res, __mask; \
> > @@ -211,6 +216,33 @@ static __always_inline int variable_fls(unsigned int x)
> > : "+A" (addr[BIT_WORD(nr)]) \
> > : "r" (mod(BIT_MASK(nr))) \
> > : "memory");
> > +#else
> > +#define __test_and_op_bit_ord(op, mod, nr, addr, ord) \
> > +({ \
> > + unsigned long __res, __mask, __temp; \
> > + __mask = BIT_MASK(nr); \
> > + __asm__ __volatile__ ( \
> > + "1: " __LR #ord " %0, %1\n" \
> > + #op " %2, %0, %3\n" \
> > + __SC #ord " %2, %2, %1\n" \
> > + "bnez %2, 1b\n" \
> > + : "=&r" (__res), "+A" (addr[BIT_WORD(nr)]), "=&r" (__temp) \
> > + : "r" (mod(__mask)) \
> > + : "memory"); \
> > + ((__res & __mask) != 0); \
> > +})
> > +
> > +#define __op_bit_ord(op, mod, nr, addr, ord) \
> > + unsigned long __res, __temp; \
> > + __asm__ __volatile__ ( \
> > + "1: " __LR #ord " %0, %1\n" \
> > + #op " %2, %0, %3\n" \
> > + __SC #ord " %2, %2, %1\n" \
> > + "bnez %2, 1b\n" \
> > + : "=&r" (__res), "+A" (addr[BIT_WORD(nr)]), "=&r" (__temp) \
> > + : "r" (mod(BIT_MASK(nr))) \
> > + : "memory")
> > +#endif
> >
> > #define __test_and_op_bit(op, mod, nr, addr) \
> > __test_and_op_bit_ord(op, mod, nr, addr, .aqrl)
> > @@ -354,12 +386,25 @@ static inline void arch___clear_bit_unlock(
> > static inline bool arch_xor_unlock_is_negative_byte(unsigned long mask,
> > volatile unsigned long *addr)
> > {
> > +#ifndef RISCV_AMO_USE_ZALRSC
> > unsigned long res;
> > __asm__ __volatile__ (
> > __AMO(xor) ".rl %0, %2, %1"
> > : "=r" (res), "+A" (*addr)
> > : "r" (__NOP(mask))
> > : "memory");
> > +#else
> > + unsigned long res, temp;
> > +
> > + __asm__ __volatile__ (
> > + "1: " __LR ".rl %0, %1\n"
> > + "xor %2, %0, %3\n"
> > + __SC ".rl %2, %2, %1\n"
> > + "bnez %2, 1b\n"
> > + : "=&r" (res), "+A" (*addr), "=&r" (temp)
> > + : "r" (__NOP(mask))
> > + : "memory");
> > +#endif
> > return (res & BIT(7)) != 0;
> > }
> >
> > diff --git a/arch/riscv/include/asm/cmpxchg.h b/arch/riscv/include/asm/cmpxchg.h
> > index 4cadc56220fe..aba60f427060 100644
> > --- a/arch/riscv/include/asm/cmpxchg.h
> > +++ b/arch/riscv/include/asm/cmpxchg.h
> > @@ -51,6 +51,7 @@
> > } \
> > })
> >
> > +#ifndef RISCV_AMO_USE_ZALRSC
> > #define __arch_xchg(sfx, prepend, append, r, p, n) \
> > ({ \
> > __asm__ __volatile__ ( \
> > @@ -61,6 +62,21 @@
> > : "r" (n) \
> > : "memory"); \
> > })
> > +#else
> > +#define __arch_xchg(sfx, prepend, append, r, p, n) \
> > +({ \
> > + __typeof__(*(__ptr)) temp; \
> > + __asm__ __volatile__ ( \
> > + prepend \
> > + "1: lr" sfx " %0, %1\n" \
> > + " sc" sfx " %2, %3, %1\n" \
> > + " bnez %2, 1b\n" \
> > + append \
> > + : "=&r" (r), "+A" (*(p)), "=&r" (temp) \
> > + : "r" (n) \
> > + : "memory"); \
> > +})
> > +#endif
> >
> > #define _arch_xchg(ptr, new, sc_sfx, swap_sfx, prepend, \
> > sc_append, swap_append) \
> > diff --git a/arch/riscv/include/asm/futex.h b/arch/riscv/include/asm/futex.h
> > index fc8130f995c1..dc63065e707e 100644
> > --- a/arch/riscv/include/asm/futex.h
> > +++ b/arch/riscv/include/asm/futex.h
> > @@ -19,6 +19,7 @@
> > #define __disable_user_access() do { } while (0)
> > #endif
> >
> > +#ifndef RISCV_AMO_USE_ZALRSC
> > #define __futex_atomic_op(insn, ret, oldval, uaddr, oparg) \
> > { \
> > __enable_user_access(); \
> > @@ -32,16 +33,39 @@
> > : "memory"); \
> > __disable_user_access(); \
> > }
> > +#else
> > +#define __futex_atomic_op(insn, ret, oldval, uaddr, oparg) \
> > +{ \
> > + __enable_user_access(); \
> > + __asm__ __volatile__ ( \
> > + "1: lr.w.aqrl %[ov], %[u]\n" \
> > + " " insn "\n" \
> > + " sc.w.aqrl %[t], %[t], %[u]\n" \
> > + " bnez %[t], 1b\n" \
> > + "2:\n" \
> > + _ASM_EXTABLE_UACCESS_ERR(1b, 2b, %[r]) \
> > + : [r] "+r" (ret), [ov] "=&r" (oldval), \
> > + [t] "=&r" (temp), [u] "+m" (*uaddr) \
> > + : [op] "Jr" (oparg) \
> > + : "memory"); \
> > + __disable_user_access(); \
> > +}
> > +#endif
> >
> > static inline int
> > arch_futex_atomic_op_inuser(int op, int oparg, int *oval, u32 __user *uaddr)
> > {
> > +#ifndef RISCV_AMO_USE_ZALRSC
> > int oldval = 0, ret = 0;
> > +#else
> > + int oldval = 0, ret = 0, temp = 0;
>
> I think it's better to define this temp variable inside of
> __futex_atomic_op() instead of requiring it to be defined in the scope
> where the macro is called.
>
> - Charlie
>
> > +#endif
> >
> > if (!access_ok(uaddr, sizeof(u32)))
> > return -EFAULT;
> >
> > switch (op) {
> > +#ifndef RISCV_AMO_USE_ZALRSC
> > case FUTEX_OP_SET:
> > __futex_atomic_op("amoswap.w.aqrl %[ov],%z[op],%[u]",
> > ret, oldval, uaddr, oparg);
> > @@ -62,6 +86,28 @@ arch_futex_atomic_op_inuser(int op, int oparg, int *oval, u32 __user *uaddr)
> > __futex_atomic_op("amoxor.w.aqrl %[ov],%z[op],%[u]",
> > ret, oldval, uaddr, oparg);
> > break;
> > +#else
> > + case FUTEX_OP_SET:
> > + __futex_atomic_op("mv %[t], %z[op]",
> > + ret, oldval, uaddr, oparg);
> > + break;
> > + case FUTEX_OP_ADD:
> > + __futex_atomic_op("add %[t], %[ov], %z[op]",
> > + ret, oldval, uaddr, oparg);
> > + break;
> > + case FUTEX_OP_OR:
> > + __futex_atomic_op("or %[t], %[ov], %z[op]",
> > + ret, oldval, uaddr, oparg);
> > + break;
> > + case FUTEX_OP_ANDN:
> > + __futex_atomic_op("and %[t], %[ov], %z[op]",
> > + ret, oldval, uaddr, ~oparg);
> > + break;
> > + case FUTEX_OP_XOR:
> > + __futex_atomic_op("xor %[t], %[ov], %z[op]",
> > + ret, oldval, uaddr, oparg);
> > + break;
> > +#endif
> > default:
> > ret = -ENOSYS;
> > }
> > --
Powered by blists - more mailing lists