linux-kernel - Re: [RFC PATCH 05/25] kvx: Add atomic/locking headers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230106141158.GC7446@tellis.lin.mbt.kalray.eu>
Date:   Fri, 6 Jan 2023 15:11:58 +0100
From:   Jules Maselbas <jmaselbas@...ray.eu>
To:     Mark Rutland <mark.rutland@....com>
Cc:     Yann Sionneau <ysionneau@...ray.eu>, Will Deacon <will@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Boqun Feng <boqun.feng@...il.com>,
        linux-kernel@...r.kernel.org,
        Clement Leger <clement.leger@...tlin.com>,
        Julian Vetter <jvetter@...ray.eu>,
        Julien Villette <jvillette@...ray.eu>
Subject: Re: [RFC PATCH 05/25] kvx: Add atomic/locking headers

Hi Mark,

On Wed, Jan 04, 2023 at 09:53:24AM +0000, Mark Rutland wrote:
> On Tue, Jan 03, 2023 at 05:43:39PM +0100, Yann Sionneau wrote:
> > Add common headers (atomic, bitops, barrier and locking) for basic
> > kvx support.
> > 
> > CC: Will Deacon <will@...nel.org>
> > CC: Peter Zijlstra <peterz@...radead.org>
> > CC: Boqun Feng <boqun.feng@...il.com>
> > CC: Mark Rutland <mark.rutland@....com>
> > CC: linux-kernel@...r.kernel.org
> > Co-developed-by: Clement Leger <clement.leger@...tlin.com>
> > Signed-off-by: Clement Leger <clement.leger@...tlin.com>
> > Co-developed-by: Jules Maselbas <jmaselbas@...ray.eu>
> > Signed-off-by: Jules Maselbas <jmaselbas@...ray.eu>
> > Co-developed-by: Julian Vetter <jvetter@...ray.eu>
> > Signed-off-by: Julian Vetter <jvetter@...ray.eu>
> > Co-developed-by: Julien Villette <jvillette@...ray.eu>
> > Signed-off-by: Julien Villette <jvillette@...ray.eu>
> > Co-developed-by: Yann Sionneau <ysionneau@...ray.eu>
> > Signed-off-by: Yann Sionneau <ysionneau@...ray.eu>
> > ---
> >  arch/kvx/include/asm/atomic.h  | 104 +++++++++++++++++
> >  arch/kvx/include/asm/barrier.h |  15 +++
> >  arch/kvx/include/asm/bitops.h  | 207 +++++++++++++++++++++++++++++++++
> >  arch/kvx/include/asm/bitrev.h  |  32 +++++
> >  arch/kvx/include/asm/cmpxchg.h | 185 +++++++++++++++++++++++++++++
> >  5 files changed, 543 insertions(+)
> >  create mode 100644 arch/kvx/include/asm/atomic.h
> >  create mode 100644 arch/kvx/include/asm/barrier.h
> >  create mode 100644 arch/kvx/include/asm/bitops.h
> >  create mode 100644 arch/kvx/include/asm/bitrev.h
> >  create mode 100644 arch/kvx/include/asm/cmpxchg.h
> > 
> > diff --git a/arch/kvx/include/asm/atomic.h b/arch/kvx/include/asm/atomic.h
> > new file mode 100644
> > index 000000000000..eb8acbcbc70d
> > --- /dev/null
> > +++ b/arch/kvx/include/asm/atomic.h
> > @@ -0,0 +1,104 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/*
> > + * Copyright (C) 2017-2023 Kalray Inc.
> > + * Author(s): Clement Leger
> > + */
> > +
> > +#ifndef _ASM_KVX_ATOMIC_H
> > +#define _ASM_KVX_ATOMIC_H
> > +
> > +#include <linux/types.h>
> > +
> > +#include <asm/cmpxchg.h>
> > +
> > +#define ATOMIC64_INIT(i)     { (i) }
> > +
> > +#define arch_atomic64_cmpxchg(v, old, new) (arch_cmpxchg(&((v)->counter), old, new))
> > +#define arch_atomic64_xchg(v, new) (arch_xchg(&((v)->counter), new))
> > +
> > +static inline long arch_atomic64_read(const atomic64_t *v)
> > +{
> > +	return v->counter;
> > +}
> 
> This is a plain read, and is *not* atomic.
> 
> The compiler can replay a plain read an arbitrary number of times, and is
> permitted to split it into smaller accesses.
> 
> At minimum this needs to be
> 
>   READ_ONCE(v->counter)
> 
> ... which will prevent replay. Whether or not that's actually atomic will
> depend on the instructions the compiler generates, and how those instructions
> are defines in your architecture.
Good point, we are going to use {READ,WRITE}_ONCE macros

> Do you have a single instruction that can read a 64-bit memory location, and is
> it guaranteed to result in a single access that cannot be split?

We do have a single instruction that can read a 64-bit memory location
(supported sizes are 8, 16, 32, 64, 128, 256).
All accesses are guaranteed to not be split, unless they are misaligned.
Furthermore, misaligned write accesses crossing a 32-byte boundary may
complete in a non-atomic way.

> 
> > +static inline void arch_atomic64_set(atomic64_t *v, long i)
> > +{
> > +	v->counter = i;
> > +}
> 
> Same comments as for arch_atomic64_read(); at minimum this needs to be:
> 
>   WRITE_ONCE(v->counter, i)
> 
> ... but that may or may not actually be atomic on your architecture.
> 
> > +#define ATOMIC64_RETURN_OP(op, c_op)					\
> > +static inline long arch_atomic64_##op##_return(long i, atomic64_t *v)	\
> > +{									\
> > +	long new, old, ret;						\
> > +									\
> > +	do {								\
> > +		old = v->counter;					\
> > +		new = old c_op i;					\
> > +		ret = arch_cmpxchg(&v->counter, old, new);		\
> > +	} while (ret != old);						\
> > +									\
> > +	return new;							\
> > +}
> > +
> > +#define ATOMIC64_OP(op, c_op)						\
> > +static inline void arch_atomic64_##op(long i, atomic64_t *v)		\
> > +{									\
> > +	long new, old, ret;						\
> > +									\
> > +	do {								\
> > +		old = v->counter;					\
> > +		new = old c_op i;					\
> > +		ret = arch_cmpxchg(&v->counter, old, new);		\
> > +	} while (ret != old);						\
> > +}
> > +
> > +#define ATOMIC64_FETCH_OP(op, c_op)					\
> > +static inline long arch_atomic64_fetch_##op(long i, atomic64_t *v)	\
> > +{									\
> > +	long new, old, ret;						\
> > +									\
> > +	do {								\
> > +		old = v->counter;					\
> > +		new = old c_op i;					\
> > +		ret = arch_cmpxchg(&v->counter, old, new);		\
> > +	} while (ret != old);						\
> > +									\
> > +	return old;							\
> > +}
> 
> These look ok, but it'd be nicer if we could teach the generic atomic64 code to
> do this, like the generic atomic code does.
> 
> We could rename the existing asm-generic/atomic64 code to atomic64-spinlock,
> and add a separate atomic64-cmpxchg (and likewise for the 32-bit code) to make
> that clearer and consistent.
I am not sure what this implies and how big this change might be,
but I'll take a look at this.

> > +
> > +#define ATOMIC64_OPS(op, c_op)						\
> > +	ATOMIC64_OP(op, c_op)						\
> > +	ATOMIC64_RETURN_OP(op, c_op)					\
> > +	ATOMIC64_FETCH_OP(op, c_op)
> > +
> > +ATOMIC64_OPS(and, &)
> > +ATOMIC64_OPS(or, |)
> > +ATOMIC64_OPS(xor, ^)
> > +ATOMIC64_OPS(add, +)
> > +ATOMIC64_OPS(sub, -)
> > +
> > +#undef ATOMIC64_OPS
> > +#undef ATOMIC64_FETCH_OP
> > +#undef ATOMIC64_OP
> > +
> > +static inline int arch_atomic_add_return(int i, atomic_t *v)
> > +{
> > +	int new, old, ret;
> > +
> > +	do {
> > +		old = v->counter;
> > +		new = old + i;
> > +		ret = arch_cmpxchg(&v->counter, old, new);
> > +	} while (ret != old);
> > +
> > +	return new;
> > +}
> > +
> > +static inline int arch_atomic_sub_return(int i, atomic_t *v)
> > +{
> > +	return arch_atomic_add_return(-i, v);
> > +}
> 
> Likewise for these two.
> 
> > +
> > +#include <asm-generic/atomic.h>
> > +
> > +#endif	/* _ASM_KVX_ATOMIC_H */
> > diff --git a/arch/kvx/include/asm/barrier.h b/arch/kvx/include/asm/barrier.h
> > new file mode 100644
> > index 000000000000..371f1c70746d
> > --- /dev/null
> > +++ b/arch/kvx/include/asm/barrier.h
> > @@ -0,0 +1,15 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/*
> > + * Copyright (C) 2017-2023 Kalray Inc.
> > + * Author(s): Clement Leger
> > + */
> > +
> > +#ifndef _ASM_KVX_BARRIER_H
> > +#define _ASM_KVX_BARRIER_H
> 
> > +/* Bitmask modifiers */
> > +#define __NOP(x)	(x)
> > +#define __NOT(x)	(~(x))
> > +
> > +
> > +#define __test_and_op_bit(nr, addr, op, mod)				\
> > +({									\
> > +	unsigned long __mask = BIT_MASK(nr);				\
> > +	unsigned long __new, __old, __ret;				\
> > +	do {								\
> > +		__old = *(&addr[BIT_WORD(nr)]);				\
> > +		__new = __old op mod(__mask);				\
> > +		__ret = cmpxchg(addr, __old, __new);			\
> > +	} while (__ret != __old);					\
> > +	(__old & __mask);						\
> > +})
> 
> Please use <asm-generic/bitops/atomic.h> which should give you the common
> bit operations "for free" atop your regular atomics.
Yes

> 
> [...]
> 
> > diff --git a/arch/kvx/include/asm/cmpxchg.h b/arch/kvx/include/asm/cmpxchg.h
> > new file mode 100644
> > index 000000000000..b1d128b060a2
> > --- /dev/null
> > +++ b/arch/kvx/include/asm/cmpxchg.h
> > @@ -0,0 +1,185 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/*
> > + * Copyright (C) 2017-2023 Kalray Inc.
> > + * Author(s): Clement Leger
> > + *            Yann Sionneau
> > + */
> > +
> > +#ifndef _ASM_KVX_CMPXCHG_H
> > +#define _ASM_KVX_CMPXCHG_H
> > +
> > +#include <linux/bits.h>
> > +#include <linux/types.h>
> > +#include <linux/build_bug.h>
> > +
> > +/*
> > + * On kvx, we have a boolean compare and swap which means that the operation
> > + * returns only the success of operation.
> > + * If operation succeed, this is simple, we just need to return the provided
> > + * old value. However, if it fails, we need to load the value to return it for
> > + * the caller. If the loaded value is different from the "old" provided by the
> > + * caller, we can return it since it will means it failed.
> > + * However, if for some reason the value we read is equal to the old value
> > + * provided by the caller, we can't simply return it or the caller will think it
> > + * succeeded. So if the value we read is the same as the "old" provided by
> > + * the caller, we try again until either we succeed or we fail with a different
> > + * value than the provided one.
> > + */
> > +#define __cmpxchg(ptr, old, new, op_suffix, load_suffix)		\
> > +({									\
> > +	register unsigned long __rn asm("r62");				\
> > +	register unsigned long __ro asm("r63");				\
> 
> Why do you need to specify the exact registers?
r62 and r63 are hardcoded in the inline assembly, they are caller saved.
I have a C implementation that uses builtins however this is not merged
in our tree yet (but I want to).

> e.g. does some instruction use these implicitly, or do you need two adjacent
> register for encoding reasons?

The atomic compare and swap (acswap) instruction needs a register "pair"
which can only exists with "adjacent" registers:  $r0r1, $r2r3 ect.

> > +	__asm__ __volatile__ (						\
> > +		/* Fence to guarantee previous store to be committed */	\
> > +		"fence\n"						\
> 
> This implies you can implement the relaxed form of cmpxchg().
> 
> What ordering do you get by default, and do you have any other barriers (e.g.
> for acquire/release semantics), or just "fence" ?
We have two barrier types:
 - fence: ensure that all uncached memory operations are committed to
   memory, typically used to ensure a write to memory is visible to
   other cores.
 - barrier: flush the core instruction pipeline and memory system

Thanks,
-- Jules