linux-kernel - Re: [PATCH v4 3/4] locking/qspinlock: Add ARCH_USE_QUEUED_SPINLOCKS

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAK8P3a3Pf3TbGoVP7JP7gfPV-WDM8MHV_hdqSwNKKFDr1Sb3zQ@mail.gmail.com>
Date:   Wed, 7 Apr 2021 10:42:50 +0200
From:   Arnd Bergmann <arnd@...db.de>
To:     Stafford Horne <shorne@...il.com>
Cc:     Guo Ren <guoren@...nel.org>, Peter Zijlstra <peterz@...radead.org>,
        linux-riscv <linux-riscv@...ts.infradead.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        linux-csky@...r.kernel.org,
        linux-arch <linux-arch@...r.kernel.org>,
        Guo Ren <guoren@...ux.alibaba.com>,
        Will Deacon <will@...nel.org>, Ingo Molnar <mingo@...hat.com>,
        Waiman Long <longman@...hat.com>,
        Anup Patel <anup@...infault.org>
Subject: Re: [PATCH v4 3/4] locking/qspinlock: Add ARCH_USE_QUEUED_SPINLOCKS_XCHG32

On Tue, Apr 6, 2021 at 10:56 AM Stafford Horne <shorne@...il.com> wrote:
> On Tue, Apr 06, 2021 at 11:50:38AM +0800, Guo Ren wrote:
> > On Wed, Mar 31, 2021 at 3:23 PM Arnd Bergmann <arnd@...db.de> wrote:
> > > On Wed, Mar 31, 2021 at 12:35 AM Stafford Horne <shorne@...il.com> wrote:
> >
> > We shouldn't export xchg16/cmpxchg16(emulated by lr.w/sc.w) in riscv,
> > We should forbid these sub-word atomic primitive and lets the
> > programmers consider their atomic design.
>
> Fair enough, having the generic sub-word emulation would be something that
> an architecture can select to use/export.

I still have the feeling that we should generalize and unify the exact behavior
across architectures as much as possible here, while possibly also trying to
simplify the interface a little.

Looking through the various xchg()/cmpxchg() implementations, I find eight
distinct ways to do 8-bit and 16-bit atomics:

Full support:
      ia64, m68k (Atari only), x86, arm32 (v6k+), arm64

gcc/clang __sync_{val,bool}_compare_and_swap:
     s390

Emulated through ll/sc:
      alpha, powerpc

Emulated through cmpxchg loop:
      mips, openrisc, xtensa (xchg but not cmpxchg), sparc64 (cmpxchg_u8,
      xchg_u16 but not cmpxchg_u16 and xchg_u8!)

Emulated through local_irq_save (non SMP only):
        h8300, m68k (most), microblaze, mips, nds32, nios2

Emulated through hashed spinlock:
        parisc (8-bit only added in 2020, 16-bit still missing)

Forced compile-time error:
       arm32 (v4/v5/v6 non-SMP), arc, csky, riscv, parisc (16 bit), sparc32,
       sparc64, xtensa (cmpxchg)

Silently broken:
        hexagon

Since there are really only a handful of instances in the kernel
that use the cmpxchg() or xchg() on u8/u16 variables, it would seem
best to just disallow those completely and have a separate set of
functions here, with only 64-bit architectures using any variable-type
wrapper to handle both 32-bit and 64-bit arguments.

Interestingly, the s390 version using __sync_val_compare_and_swap()
seems to produce nice output on all architectures that have atomic
instructions, with any supported compiler, to the point where I think
we could just use that to replace most of the inline-asm versions except
for arm64:

#define cmpxchg(ptr, o, n)                                              \
({                                                                      \
        __typeof__(*(ptr)) __o = (o);                                   \
        __typeof__(*(ptr)) __n = (n);                                   \
        (__typeof__(*(ptr))) __sync_val_compare_and_swap((ptr),__o,__n);\
})

Not how gcc's acquire/release behavior of __sync_val_compare_and_swap()
relates to what the kernel wants here.

The gcc documentation also recommends using the standard
__atomic_compare_exchange_n() builtin instead, which would allow
constructing release/acquire/relaxed versions as well, but I could not
get it to produce equally good output. (possibly I was using it wrong)

       Arnd