lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 6 Apr 2021 09:15:50 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Guo Ren <guoren@...nel.org>
Cc:     linux-riscv <linux-riscv@...ts.infradead.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        linux-csky@...r.kernel.org,
        linux-arch <linux-arch@...r.kernel.org>,
        Guo Ren <guoren@...ux.alibaba.com>,
        Will Deacon <will@...nel.org>, Ingo Molnar <mingo@...hat.com>,
        Waiman Long <longman@...hat.com>,
        Arnd Bergmann <arnd@...db.de>, Anup Patel <anup@...infault.org>
Subject: Re: [PATCH v4 3/4] locking/qspinlock: Add
 ARCH_USE_QUEUED_SPINLOCKS_XCHG32

On Wed, Mar 31, 2021 at 11:22:35PM +0800, Guo Ren wrote:
> On Mon, Mar 29, 2021 at 8:50 PM Peter Zijlstra <peterz@...radead.org> wrote:
> >
> > On Mon, Mar 29, 2021 at 08:01:41PM +0800, Guo Ren wrote:
> > > u32 a = 0x55aa66bb;
> > > u16 *ptr = &a;
> > >
> > > CPU0                       CPU1
> > > =========             =========
> > > xchg16(ptr, new)     while(1)
> > >                                     WRITE_ONCE(*(ptr + 1), x);
> > >
> > > When we use lr.w/sc.w implement xchg16, it'll cause CPU0 deadlock.
> >
> > Then I think your LL/SC is broken.
> No, it's not broken LR.W/SC.W. Quote <8.3 Eventual Success of
> Store-Conditional Instructions>:
> 
> "As a consequence of the eventuality guarantee, if some harts in an
> execution environment are executing constrained LR/SC loops, and no
> other harts or devices in the execution environment execute an
> unconditional store or AMO to that reservation set, then at least one
> hart will eventually exit its constrained LR/SC loop. By contrast, if
> other harts or devices continue to write to that reservation set, it
> is not guaranteed that any hart will exit its LR/SC loop."

(there, reflowed it for you)

That just means your arch spec is broken too :-)

> So I think it's a feature of LR/SC. How does the above code (also use
> ll.w/sc.w to implement xchg16) running on arm64?
> 
> 1: ldxr
>     eor
>     cbnz ... 2f
>     stxr
>     cbnz ... 1b   // I think it would deadlock for arm64.
> 
> "LL/SC fwd progress" which you have mentioned could guarantee stxr
> success? How hardware could do that?

I'm not a hardware person; I've never actually build anything larger
than a 4 bit adder with nand gates (IIRC, 25+ years ago). And I'll let
Will answer the ARM64 part.

That said, I think the idea is that if you lock the line (load-locked is
a clue ofcourse) to the core until either: an exception (or anything
else that is guaranteed to fail LL/SC), SC or N instructions, then a
competing LL/SC will stall in the LL while the first core makes
progress.

This same principle is key to hardware progress for cmpxchg/cas loops,
don't instantly yield the exclusive hold on the cacheline, keep it
around for a while.

Out-of-order CPUs can do even better I think, by virtue of them being
able to see tight loops.


Anyway, given you have such a crap architecture (and here I thought
RISC-V was supposed to be a modern design *sigh*), you had better go
look at the sparc64 atomic implementation which has a software backoff
for failed CAS in order to make fwd progress.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ