linux-kernel - Re: LKMM: Making RMW barriers explicit

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <93dc7c63-425d-4b89-8a36-48e0d5aa3674@paulmck-laptop>
Date: Thu, 23 May 2024 06:35:56 -0700
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Jonas Oberhauser <jonas.oberhauser@...weicloud.com>
Cc: Alan Stern <stern@...land.harvard.edu>,
	Hernan Ponce de Leon <hernan.poncedeleon@...weicloud.com>,
	linux-kernel@...r.kernel.org, linux-arch@...r.kernel.org,
	kernel-team@...a.com, parri.andrea@...il.com, boqun.feng@...il.com,
	j.alglave@....ac.uk, luc.maranget@...ia.fr,
	Joel Fernandes <joel@...lfernandes.org>
Subject: Re: LKMM: Making RMW barriers explicit

On Thu, May 23, 2024 at 02:54:05PM +0200, Jonas Oberhauser wrote:
> Am 5/22/2024 um 4:20 PM schrieb Alan Stern:
> > On Wed, May 22, 2024 at 11:20:47AM +0200, Jonas Oberhauser wrote:
> > > Am 5/21/2024 um 5:36 PM schrieb Alan Stern:
> > > > On Tue, May 21, 2024 at 11:57:29AM +0200, Jonas Oberhauser wrote:
> > > > > Am 5/18/2024 um 2:31 AM schrieb Alan Stern:
> > > > > > On Thu, May 16, 2024 at 10:44:05AM +0200, Hernan Ponce de Leon wrote:
> > > > > > > On 5/16/2024 10:31 AM, Jonas Oberhauser wrote:
> > > > > > > > Am 5/16/2024 um 3:43 AM schrieb Alan Stern:
> > > > > > > > > Hernan and Jonas:
> > > > > > > > > 
> > > > > > > > > Can you explain more fully the changes you want to make to herd7 and/or
> > > > > > > > > the LKMM?  The goal is to make the memory barriers currently implicit in
> > > > > > > > > RMW operations explicit, but I couldn't understand how you propose to do
> > > > > > > > > this.
> > > > > > > > > 
> > > > > > > > > Are you going to change herd7 somehow, and if so, how?  It seems like
> > > > > > > > > you should want to provide sufficient information so that the .bell
> > > > > > > > > and .cat files can implement the appropriate memory barriers associated
> > > > > > > > > with each RMW operation.  What additional information is needed?  And
> > > > > > > > > how (explained in English, not by quoting source code) will the .bell
> > > > > > > > > and .cat files make use of this information?
> > > > > > > > > 
> > > > > > > > > Alan
> > > > > > > > 
> > > > > > > > 
> > > > > > > > I don't know whether herd7 needs to be changed. Probably, herd7 does the
> > > > > > > > following:
> > > > > > > > - if a tag called Mb appears on an rmw instruction (by instruction I
> > > > > > > > mean things like xchg(), atomic_inc_return_relaxed()), replace it with
> > > > > > > > one of those things:
> > > > > > > >       * full mb ; once (the rmw) ; full mb, if a value returning
> > > > > > > > (successful) rmw
> > > > > > > >       * once (the rmw)   otherwise
> > > > > > > > - everything else gets translated 1:1 into some internal representation
> > > > > > > 
> > > > > > > This is my understanding from reading the source code of CSem.ml in herd7's
> > > > > > > repo.
> > > > > > > 
> > > > > > > Also, this is exactly what dartagnan is currently doing.
> > > > > > > 
> > > > > > > > 
> > > > > > > > What I'm proposing is:
> > > > > > > > 1. remove this transpilation step,
> > > > > > > > 2. and instead allow the Mb tag to actually appear on RMW instructions
> > > > > > > > 3. change the cat file to explicitly define the behavior of the Mb tag
> > > > > > > > on RMW instructions
> > > > > > > 
> > > > > > > These are the exact 3 things I changed in dartagnan for testing what Jonas
> > > > > > > proposed.
> > > > > > > 
> > > > > > > I am not sure if further changes are needed for herd7.
> > > > 
> > > > What about failed RMW instructions?  IIRC, herd7 generates just an R for
> > > > these, not both R and W, but won't it still be annotated with an mb tag?
> > > > And wasn't this matter of failed RMWs one of the issues that the two of
> > > > you specifically wanted to make explicit in the memory model, rather
> > > > than implicit in the operation of herd7?
> > > 
> > > That's why we use the RMW_MB tag. I should have copied that definition too:
> > > 
> > > 
> > > (* full barrier events that appear in non-failing RMW *)
> > > let RMW_MB = Mb & (dom(rmw) | range(rmw))
> > > 
> > > 
> > > This ensures that the only successful rmw instructions have an RMW_MB tag.
> > 
> > It would be better if there was a way to tell herd7 not to add the 'mb
> > tag to failed instructions in the first place.  This approach is
> > brittle; see below.
> 
> Hernan told me that in fact that is actually currently the case in herd7.
> Failing RMW get assigned the Once tag implicitly.
> Another thing that I'd suggest to change.

Let's please be careful, though.  There is a lot out there that depends
on this semantic, odd though it might seem at first glance.  ;-)

							Thanx, Paul

> > An alternative would be to have a way for the .cat file to remove the
> > 'mb tag from a failed RMW instruction.  But I don't know if this is
> > feasible.
> 
> For Mb it's feasible, as there is no Mb read or Mb store.
> 
> Mb = Mb & (~M | dom(rmw) | range(rmw))
> 
> However one would want to do the same for Acq and Rel.
> 
> For that one would need to distinguish e.g. between a read that comes from a
> failed rmw instruction, and where the tag would disappear, or a normal
> standalone read.
> 
> For example, by using two different acquire tags, 'acquire and 'rmw-acquire,
> and defining
> 
> Acquire = Acquire | Rmw-acquire & (dom(rmw) | range(rmw))
> 
> Anyways we can do this change independently. So for now, we don't need
> RMW_MB.
> 
> 
> > 
> > > > And wasn't another one of these issues the difference between
> > > > value-returning and non-value-returning RMWs?  As far as I can, nothing
> > > > in the .def file specifically mentions this.  There's the noreturn tag
> > > > in the .bell file, but nothing in the .def file says which instructions
> > > > it applies to.  Or are we supposed to know that it automatically applies
> > > > to all __atomic_op() instances?
> > > 
> > > Ah, now you're already forestalling my next suggestion :))
> > > 
> > > I would say let's fix one issue at a time (learned this from Andrea).
> > > 
> > > For the other issue, do noreturn rmws always have the same ordering as once?
> > 
> > If they aren't annotated with _acquire or _release then yes...  And I
> > don't know whether there _are_ any annotated no-return RMWs.  If
> > somebody wanted such a thing, they probably would just use a
> > value-returning RMW instead.
> > 
> > > I suspect we need to extend herd slightly more to support the second one, I
> > > don't know if there's syntax for passing tags to __atomic_op.
> > 
> > This may not be be needed.  But still, it would nice to be explicit (in
> > a comment in the .def file if nowhere else) that __atomic_op always adds
> > a 'noreturn tag.
> > 
> > > > > instructions RMW[{'once,'acquire,'release,'mb}]
> > > > > 
> > > > > then the Mb tags would appear in the graph. And then I'd define the ordering
> > > > > explicitly. One way is to say that an Mb tag orders all memory accesses
> > > > > before(or at) the tag with all memory accesses after(or at) the tag, except
> > > > > the accesses of the rmw with each other.
> > > > 
> > > > I don't think you need to add very much.  The .cat file already defines
> > > > the mb relation as including the term:
> > > > 
> > > > 	([M] ; fencerel(Mb) ; [M])
> > > > 
> > > > All that's needed is to replace the fencerel(Mb) with something more
> > > > general...
> > 
> > And this is why I said the RMW_MB mechanism is brittle.  With the 'mb
> > tag still added to failed RMW events, the term above will cause the
> > memory model to think there is ordering even though the event isn't in
> > the RMW_MB class.
> > 
> 
> Huh, I thought that fencerel(Mb) is something like po ; [F & Mb] ; po (where
> F includes standalone fences). But looking into the stdlib.cat, you're
> right.
> 
> 
> > > > Also, is the "\ rmw" part really necessary?  I don't think it makes any
> > > > difference; the memory model already knows that the read part of an RMW
> > > > has to happen before the write part.
> > > 
> > > It unfortunately does make a difference, because mb is a strong fence.
> > > This means that an Mb in an rmw sequence would create additional pb edges.
> > > 
> > >    prop;(rfe ; [Mb];rmw;[Mb])
> > > 
> > > I believe this is might give wrong results on powerpc, but I'd need to
> > > double check.
> > 
> > PowerPC uses a load-reserve/store-conditional approach for RMW, so it's
> > tricky.  However, you're right that having a fictitious mb between the
> > read and the write of an RMW would mean that the preceding (in coherence
> > order) write would have to be visible to all CPUs before the RMW write
> > could execute, and I don't believe we want to assert this.
> > 
> > > > > One could also split it into two rules to keep with the "two full fences"
> > > > > analogy. Maybe a good way would be like this:
> > > > > 
> > > > >        [M] ; po; [RMW_MB & R] ; po^? ; [M]
> > > > > 
> > > > >        [M] ; po^?; [RMW_MB & W] ; po ; [M]
> > > > 
> > > > My preference is for the first approach.
> > > 
> > > That was also my early preference, but to be honest I expected that you'd
> > > prefer the second approach.
> > > Actually, I also started warming up to it.
> > 
> > If you do want to use this approach, it should be simplified.  All you
> > need is:
> > 
> > 	[M] ; po ; [RMW_MB]
> > 
> > 	[RMW_MB] ; po ; [M]
> > 
> > This is because events tagged with RMW_MB always are memory accesses,
> > and accesses that aren't part of the RMW are already covered by the
> > fencerel(Mb) thing above.
> 
> This has exactly the issue mentioned above - it will cause the rmw to have
> an internal strong fence that on powerpc probably isn't there.
> 
> We could do (with the assumption that Mb applies only to successful rmw):
> 
>  	[M] ; po ; [Mb & R]
>  	[Mb & W] ; po ; [M]
> 
> 
> Kind regards, jonas
>