[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAK8P3a3Pf3TbGoVP7JP7gfPV-WDM8MHV_hdqSwNKKFDr1Sb3zQ@mail.gmail.com>
Date: Wed, 7 Apr 2021 10:42:50 +0200
From: Arnd Bergmann <arnd@...db.de>
To: Stafford Horne <shorne@...il.com>
Cc: Guo Ren <guoren@...nel.org>, Peter Zijlstra <peterz@...radead.org>,
linux-riscv <linux-riscv@...ts.infradead.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
linux-csky@...r.kernel.org,
linux-arch <linux-arch@...r.kernel.org>,
Guo Ren <guoren@...ux.alibaba.com>,
Will Deacon <will@...nel.org>, Ingo Molnar <mingo@...hat.com>,
Waiman Long <longman@...hat.com>,
Anup Patel <anup@...infault.org>
Subject: Re: [PATCH v4 3/4] locking/qspinlock: Add ARCH_USE_QUEUED_SPINLOCKS_XCHG32
On Tue, Apr 6, 2021 at 10:56 AM Stafford Horne <shorne@...il.com> wrote:
> On Tue, Apr 06, 2021 at 11:50:38AM +0800, Guo Ren wrote:
> > On Wed, Mar 31, 2021 at 3:23 PM Arnd Bergmann <arnd@...db.de> wrote:
> > > On Wed, Mar 31, 2021 at 12:35 AM Stafford Horne <shorne@...il.com> wrote:
> >
> > We shouldn't export xchg16/cmpxchg16(emulated by lr.w/sc.w) in riscv,
> > We should forbid these sub-word atomic primitive and lets the
> > programmers consider their atomic design.
>
> Fair enough, having the generic sub-word emulation would be something that
> an architecture can select to use/export.
I still have the feeling that we should generalize and unify the exact behavior
across architectures as much as possible here, while possibly also trying to
simplify the interface a little.
Looking through the various xchg()/cmpxchg() implementations, I find eight
distinct ways to do 8-bit and 16-bit atomics:
Full support:
ia64, m68k (Atari only), x86, arm32 (v6k+), arm64
gcc/clang __sync_{val,bool}_compare_and_swap:
s390
Emulated through ll/sc:
alpha, powerpc
Emulated through cmpxchg loop:
mips, openrisc, xtensa (xchg but not cmpxchg), sparc64 (cmpxchg_u8,
xchg_u16 but not cmpxchg_u16 and xchg_u8!)
Emulated through local_irq_save (non SMP only):
h8300, m68k (most), microblaze, mips, nds32, nios2
Emulated through hashed spinlock:
parisc (8-bit only added in 2020, 16-bit still missing)
Forced compile-time error:
arm32 (v4/v5/v6 non-SMP), arc, csky, riscv, parisc (16 bit), sparc32,
sparc64, xtensa (cmpxchg)
Silently broken:
hexagon
Since there are really only a handful of instances in the kernel
that use the cmpxchg() or xchg() on u8/u16 variables, it would seem
best to just disallow those completely and have a separate set of
functions here, with only 64-bit architectures using any variable-type
wrapper to handle both 32-bit and 64-bit arguments.
Interestingly, the s390 version using __sync_val_compare_and_swap()
seems to produce nice output on all architectures that have atomic
instructions, with any supported compiler, to the point where I think
we could just use that to replace most of the inline-asm versions except
for arm64:
#define cmpxchg(ptr, o, n) \
({ \
__typeof__(*(ptr)) __o = (o); \
__typeof__(*(ptr)) __n = (n); \
(__typeof__(*(ptr))) __sync_val_compare_and_swap((ptr),__o,__n);\
})
Not how gcc's acquire/release behavior of __sync_val_compare_and_swap()
relates to what the kernel wants here.
The gcc documentation also recommends using the standard
__atomic_compare_exchange_n() builtin instead, which would allow
constructing release/acquire/relaxed versions as well, but I could not
get it to produce equally good output. (possibly I was using it wrong)
Arnd
Powered by blists - more mailing lists