[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <83b64af7-03b7-4dd8-900b-c57a65479ba8@rowland.harvard.edu>
Date: Thu, 9 Jan 2025 11:17:09 -0500
From: Alan Stern <stern@...land.harvard.edu>
To: Jonas Oberhauser <jonas.oberhauser@...weicloud.com>
Cc: paulmck@...nel.org, parri.andrea@...il.com, will@...nel.org,
peterz@...radead.org, boqun.feng@...il.com, npiggin@...il.com,
dhowells@...hat.com, j.alglave@....ac.uk, luc.maranget@...ia.fr,
akiyks@...il.com, dlustig@...dia.com, joel@...lfernandes.org,
urezki@...il.com, quic_neeraju@...cinc.com, frederic@...nel.org,
linux-kernel@...r.kernel.org, lkmm@...ts.linux.dev,
hernan.poncedeleon@...weicloud.com
Subject: Re: [RFC] tools/memory-model: Rule out OOTA
On Wed, Jan 08, 2025 at 08:22:07PM +0100, Jonas Oberhauser wrote:
>
>
> Am 1/8/2025 um 7:47 PM schrieb Alan Stern:
> > On Wed, Jan 08, 2025 at 06:33:05PM +0100, Jonas Oberhauser wrote:
> > >
> > >
> > > Am 1/7/2025 um 5:09 PM schrieb Alan Stern:
> > > > Is this really valid? In the example above, if there were no other
> > > > references to a or b in the rest of the program, the compiler could
> > > > eliminate them entirely.
> > >
> > > In the example above, this is not possible, because the address of a/b have
> > > escaped the function and are not deallocated before synchronization happens.
> > > Therefore the compiler must assume that a/b are accessed inside the compiler
> > > barrier.
> >
> > I'm not quite sure what you mean by that, but if the compiler has access
> > to the entire program and can do a global analysis then it can recognize
> > cases where accesses that may seem to be live aren't really.
>
> Even for trivial enough cases where the compiler has access to all the
> source, compiler barriers should be opaque to the compiler.
>
> Since it is opaque,
>
> *a = 1;
> compiler_barrier();
>
> might as well be
>
> *a = 1;
> *d = *a; // *d is in device memory
>
> and so in my opinion the compiler needs to ensure that the value of *a right
> before the compiler barrier is 1.
>
> Of course, only if the address of *a could be possibly legally known to the
> opaque code in the compiler barrier.
What do you mean by "opaque code in the compiler barrier"? The
compiler_barrier() instruction doesn't generate any code at all; it
merely directs the compiler not to carry any knowledge about values
stored in memory from one side of the barrier to the other.
Note that it does _not_ necessarily prevent the compiler from carrying
knowledge that a memory location is unused from one side of the barrier
to the other.
> > However, I admit this objection doesn't really apply to Linux kernel
> > programming.
> >
> > > > (Whether the result could count as OOTA is
> > > > open to question, but that's not the point.) Is it not possible that a
> > > > compiler might find other ways to defeat your intentions?
> > >
> > > The main point (which I already mentioned in the previous e-mail) is if the
> > > object is deallocated without synchronization (or never escapes the function
> > > in the first place).
> > >
> > > And indeed, any such case renders the added rule unsound. It is in a sense
> > > unrelated to OOTA; cases where the load/store can be elided are never OOTA.
> >
> > That is a matter of definition. In our paper, Paul and I described
> > instances of OOTA in which all the accesses have been optimized away as
> > "trivial".
>
> Yes, by OOTA I mean a rwdep;rfe cycle.
>
> In the absence of data races, such a cycle can't be optimized away because
> it is created with volatile/compiler-barrier-protected accesses.
That wasn't true in the C++ context of the paper Paul and I worked on.
Of course, C++ is not our current context here.
What I was trying to get at above is that compiler-barrier protection
does not necessarily guarantee that non-volatile accesses can't be
optimized away. (However, it's probably safe for us to make such an
assumption here.)
> It would look something like this:
>
> Live = R & rng(po \ po ; [W] ; (po-loc \ w_barrier)) | W & dom(po \ ((po-loc
> \ w_barrier) ; [W] ; po))
>
> let to-w = (overwrite & int) | (addr | rmb ; [Live])? ; rwdep ; ([Live] ;
> wmb)?
>
>
> > In any case, it seems that any approximation we can make to Live will be
> > subject to various sorts of errors.
>
> Probably (this is certainly true for trying to approximate dependencies, for
> example), but what I know for certain is that the approximations of Live
> inside cat get more ugly the more precise they become. In the above
> definition of Live I have not included that the address must escape, nor
> that it must not be freed.
>
> A non-local definition that suffices for OOTA would be so:
>
> Live = R & rng(rfe) & dom(rwdep ; rfe) | W & dom(rfe)
I could live with this (although I would prefer to have more parentheses
-- IMO it'smistake-prone to rely on the relative precedence of | and &).
Especially if the to-w definition above were rewritten in a way that
would be a little easier to parse and understand.
> It seems the ideal solution is to let Live be defined by the tools, which
> should keep up with or exceed the analysis done by state-of-art compilers.
I don't think it works that way in practice. :-)
Alan
Powered by blists - more mailing lists