lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 7 Apr 2021 13:36:45 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Arnd Bergmann <arnd@...db.de>
Cc:     Stafford Horne <shorne@...il.com>, Guo Ren <guoren@...nel.org>,
        linux-riscv <linux-riscv@...ts.infradead.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        linux-csky@...r.kernel.org,
        linux-arch <linux-arch@...r.kernel.org>,
        Guo Ren <guoren@...ux.alibaba.com>,
        Will Deacon <will@...nel.org>, Ingo Molnar <mingo@...hat.com>,
        Waiman Long <longman@...hat.com>,
        Anup Patel <anup@...infault.org>
Subject: Re: [PATCH v4 3/4] locking/qspinlock: Add
 ARCH_USE_QUEUED_SPINLOCKS_XCHG32

On Wed, Apr 07, 2021 at 10:42:50AM +0200, Arnd Bergmann wrote:
> Since there are really only a handful of instances in the kernel
> that use the cmpxchg() or xchg() on u8/u16 variables, it would seem
> best to just disallow those completely 

Not going to happen. xchg16 is optimal for qspinlock and if we replace
that with a cmpxchg loop on x86 we're regressing.

> Interestingly, the s390 version using __sync_val_compare_and_swap()
> seems to produce nice output on all architectures that have atomic
> instructions, with any supported compiler, to the point where I think
> we could just use that to replace most of the inline-asm versions except
> for arm64:
> 
> #define cmpxchg(ptr, o, n)                                              \
> ({                                                                      \
>         __typeof__(*(ptr)) __o = (o);                                   \
>         __typeof__(*(ptr)) __n = (n);                                   \
>         (__typeof__(*(ptr))) __sync_val_compare_and_swap((ptr),__o,__n);\
> })

It generates the LL/SC loop, but doesn't do sensible optimizations when
it's again used in a loop itself. That is, it generates a loop of a
loop, just like what you'd expect, which is sub-optimal for LL/SC.

> Not how gcc's acquire/release behavior of __sync_val_compare_and_swap()
> relates to what the kernel wants here.
> 
> The gcc documentation also recommends using the standard
> __atomic_compare_exchange_n() builtin instead, which would allow
> constructing release/acquire/relaxed versions as well, but I could not
> get it to produce equally good output. (possibly I was using it wrong)

I'm scared to death of the C11 crap, the compiler will 'optimize' them
when it feels like it and use the C11 memory model rules for it, which
are not compatible with the kernel rules.

But the same thing applies, it won't do the right thing for composites.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ