lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210729123458.GA21766@willie-the-truck>
Date:   Thu, 29 Jul 2021 13:34:58 +0100
From:   Will Deacon <will@...nel.org>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Rui Wang <wangrui@...ngson.cn>, Ingo Molnar <mingo@...hat.com>,
        Arnd Bergmann <arnd@...db.de>,
        Waiman Long <longman@...hat.com>,
        Boqun Feng <boqun.feng@...il.com>, Guo Ren <guoren@...nel.org>,
        linux-arch@...r.kernel.org, linux-kernel@...r.kernel.org,
        Rui Wang <r@....cc>, Xuefeng Li <lixuefeng@...ngson.cn>,
        Huacai Chen <chenhuacai@...il.com>,
        Jiaxun Yang <jiaxun.yang@...goat.com>,
        Huacai Chen <chenhuacai@...ngson.cn>,
        kernel test robot <lkp@...el.com>
Subject: Re: [RFC PATCH v3] locking/atomic: Implement
 atomic{,64,_long}_{fetch_,}{andnot_or}{,_relaxed,_acquire,_release}()

On Thu, Jul 29, 2021 at 01:15:07PM +0200, Peter Zijlstra wrote:
> On Thu, Jul 29, 2021 at 10:55:52AM +0100, Will Deacon wrote:
> 
> > Overall, I'm not thrilled to bits by extending the atomics API with
> > operations that cannot be implemented efficiently on any (?) architectures
> > and are only used by the qspinlock slowpath on machines with more than 16K
> > CPUs.
> 
> My rationale for proposing this primitive is similar to the existence of
> other composite atomic ops from the Misc (and refcount) class (as per
> atomic_t.txt). They're common/performance sensitive operations that, on
> LL/SC platforms, can be better implemented than a cmpxchg() loop.
> 
> Specifically here, it can be used to implement short xchg() in an
> architecturally neutral way, but more importantly it provides fwd
> progress on LL/SC, while most LL/SC based cmpxchg() implementations are
> arguably broken there.

Well, assuming the CPU provides forward progress for LL/SC which is _very_
rare (i.e. Power). If you implement LL/SC in your L1 it's really hard to
get forward progress guarantees once your micro-architecture starts being
aggressive about speculation.

For arm64, I would prefer the CAS loop to the LL/SC version, but we actually
have short xchg() so I would much prefer that people used that! So my worry
is that we start seeing users of this new thing crop up all over the place
and it's not at all obvious that it's much worse than xchg().

Will

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ