linux-kernel - Re: [tip:locking/core] tools/memory-model: Add extra ordering for locks and remove it for ordinary release/acquire

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YTrvLHB6lpol79ka@boqun-archlinux>
Date:   Fri, 10 Sep 2021 13:37:48 +0800
From:   Boqun Feng <boqun.feng@...il.com>
To:     Dan Lustig <dlustig@...dia.com>
Cc:     Will Deacon <will@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Alan Stern <stern@...land.harvard.edu>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Peter Anvin <hpa@...or.com>,
        Andrea Parri <parri.andrea@...il.com>,
        Ingo Molnar <mingo@...nel.org>,
        "Paul E. McKenney" <paulmck@...nel.org>,
        Vince Weaver <vincent.weaver@...ne.edu>,
        Thomas Gleixner <tglx@...utronix.de>,
        Jiri Olsa <jolsa@...hat.com>,
        Arnaldo Carvalho de Melo <acme@...hat.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Stephane Eranian <eranian@...gle.com>,
        linux-tip-commits@...r.kernel.org, palmer@...belt.com,
        paul.walmsley@...ive.com, mpe@...erman.id.au
Subject: Re: [tip:locking/core] tools/memory-model: Add extra ordering for
 locks and remove it for ordinary release/acquire

On Fri, Sep 10, 2021 at 08:01:14AM +0800, Boqun Feng wrote:
> On Thu, Sep 09, 2021 at 01:03:18PM -0400, Dan Lustig wrote:
> > On 9/9/2021 9:35 AM, Will Deacon wrote:
> > > [+Palmer, PaulW, Daniel and Michael]
> > > 
> > > On Thu, Sep 09, 2021 at 09:25:30AM +0200, Peter Zijlstra wrote:
> > >> On Wed, Sep 08, 2021 at 09:08:33AM -0700, Linus Torvalds wrote:
> > >>
> > >>> So if this is purely a RISC-V thing,
> > >>
> > >> Just to clarify, I think the current RISC-V thing is stonger than
> > >> PowerPC, but maybe not as strong as say ARM64, but RISC-V memory
> > >> ordering is still somewhat hazy to me.
> > >>
> > >> Specifically, the sequence:
> > >>
> > >> 	/* critical section s */
> > >> 	WRITE_ONCE(x, 1);
> > >> 	FENCE RW, W
> > >> 	WRITE_ONCE(s.lock, 0);		/* store S */
> > >> 	AMOSWAP %0, 1, r.lock		/* store R */
> > >> 	FENCE R, RW
> > >> 	WRITE_ONCE(y, 1);
> > >> 	/* critical section r */
> > >>
> > >> fully separates section s from section r, as in RW->RW ordering
> > >> (possibly not as strong as smp_mb() though), while on PowerPC it would
> > >> only impose TSO ordering between sections.
> > >>
> > >> The AMOSWAP is a RmW and as such matches the W from the RW->W fence,
> > >> similarly it marches the R from the R->RW fence, yielding an:
> > >>
> > >> 	RW->  W
> > >> 	    RmW
> > >> 	    R  ->RW
> > >>
> > >> ordering. It's the stores S and R that can be re-ordered, but not the
> > >> sections themselves (same on PowerPC and many others).
> > >>
> > >> Clarification from a RISC-V enabled person would be appreciated.
> > 
> > To first order, RISC-V's memory model is very similar to ARMv8's.  It
> > is "other-multi-copy-atomic", unlike Power, and respects dependencies.
> > It also has AMOs and LR/SC with optional RCsc acquire or release
> > semantics.  There's no need to worry about RISC-V somehow pushing the
> > boundaries of weak memory ordering in new ways.
> > 
> > The tricky part is that unlike ARMv8, RISC-V doesn't have load-acquire
> > or store-release opcodes at all.  Only AMOs and LR/SC have acquire or
> > release options.  That means that while certain operations like swap
> > can be implemented with native RCsc semantics, others like store-release
> > have to fall back on fences and plain writes.
> > 
> > That's where the complexity came up last time this was discussed, at
> > least as it relates to RISC-V: how to make sure the combination of RCsc
> > atomics and plain operations+fences gives the semantics everyone is
> > asking for here.  And to be clear there, I'm not asking for LKMM to
> > weaken anything about critical section ordering just for RISC-V's sake.
> > TSO/RCsc ordering between critical sections is a perfectly reasonable
> > model in my opinion.  I just want to make sure RISC-V gets it right
> > given whatever the decision is.
> > 
> > >>> then I think it's entirely reasonable to
> > >>>
> > >>>         spin_unlock(&r);
> > >>>         spin_lock(&s);
> > >>>
> > >>> cannot be reordered.
> > >>
> > >> I'm obviously completely in favour of that :-)
> > > 
> > > I don't think we should require the accesses to the actual lockwords to
> > > be ordered here, as it becomes pretty onerous for relaxed LL/SC
> > > architectures where you'd end up with an extra barrier either after the
> > > unlock() or before the lock() operation. However, I remain absolutely in
> > > favour of strengthening the ordering of the _critical sections_ guarded by
> > > the locks to be RCsc.
> > 
> > I agree with Will here.  If the AMOSWAP above is actually implemented with
> > a RISC-V AMO, then the two critical sections will be separated as if RW,RW,
> > as Peter described.  If instead it's implemented using LR/SC, then RISC-V
> 
> Just out of curiosity, in the following code, can the store S and load L
> be reordered?
> 
> 	WRITE_ONCE(x, 1); // store S
> 	FENCE RW, W
>  	WRITE_ONCE(s.lock, 0); // unlock(s)
>  	AMOSWAP %0, 1, s.lock  // lock(s)
> 	FENCE R, RW
> 	r1 = READ_ONCE(y); // load L
> 
> I think they can, because neither "FENCE RW, W" nor "FENCE R, RW" order
> them. Note that the reordering is allowed in LKMM, because unlock-lock
> only need to be as strong as RCtso.
> 
> Moreover, how about the outcome of the following case:
> 
> 	{ 
> 	r1, r2 are registers (variables) on each CPU, X, Y are memory
> 	locations, and initialized as 0
> 	}
> 
> 	CPU 0
> 	=====
> 	AMOSWAP r1, 1, X
> 	FENCE R, RW
> 	r2 = READ_ONCE(Y);
> 
> 	CPU 1
> 	=====
> 	WRITE_ONCE(Y, 1);
> 	FENCE RW, RW
> 	r2 = READ_ONCE(X);
> 
> can we observe the result where r2 on CPU0 is 0 while r2 on CPU1 is 1?
> 

As reminded by Andrea, what I meant to ask here is:

can we observer the result where r2 on CPU0 is 0 while r2 on CPU1 is 0?

Regards,
Boqun

> Regards,
> Boqun
> 
> > gives only TSO (R->R, R->W, W->W), because the two pieces of the AMO are
> > split, and that breaks the chain.  Getting full RW->RW between the critical
> > sections would therefore require an extra fence.  Also, the accesses to the
> > lockwords themselves would not be ordered without an extra fence.
> > 
> > > Last time this came up, I think the RISC-V folks were generally happy to
> > > implement whatever was necessary for Linux [1]. The thing that was stopping
> > > us was Power (see CONFIG_ARCH_WEAK_RELEASE_ACQUIRE), wasn't it? I think
> > > Michael saw quite a bit of variety in the impact on benchmarks [2] across
> > > different machines. So the question is whether newer Power machines are less
> > > affected to the degree that we could consider making this change again.
> > 
> > Yes, as I said above, RISC-V will implement what is needed to make this work.
> > 
> > Dan
> > 
> > > Will
> > > 
> > > [1] https://lore.kernel.org/lkml/11b27d32-4a8a-3f84-0f25-723095ef1076@nvidia.com/
> > > [2] https://lore.kernel.org/lkml/87tvp3xonl.fsf@concordia.ellerman.id.au/