[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <22c28114-145d-4968-a346-75abbc058b84@rowland.harvard.edu>
Date: Mon, 13 Jan 2025 14:43:00 -0500
From: Alan Stern <stern@...land.harvard.edu>
To: Jonas Oberhauser <jonas.oberhauser@...weicloud.com>
Cc: paulmck@...nel.org, parri.andrea@...il.com, will@...nel.org,
peterz@...radead.org, boqun.feng@...il.com, npiggin@...il.com,
dhowells@...hat.com, j.alglave@....ac.uk, luc.maranget@...ia.fr,
akiyks@...il.com, dlustig@...dia.com, joel@...lfernandes.org,
urezki@...il.com, quic_neeraju@...cinc.com, frederic@...nel.org,
linux-kernel@...r.kernel.org, lkmm@...ts.linux.dev,
hernan.poncedeleon@...weicloud.com
Subject: Re: [RFC] tools/memory-model: Rule out OOTA
On Sun, Jan 12, 2025 at 04:55:07PM +0100, Jonas Oberhauser wrote:
>
>
> Am 1/11/2025 um 10:19 PM schrieb Alan Stern:
> > On Sat, Jan 11, 2025 at 01:46:21PM +0100, Jonas Oberhauser wrote:
> >>
> > > This is under my assumption that if we had let's say gcc's "semantic
> > > dependencies" or an under-approximation of it (by that I mean allow less
> > > things to be dependent than gcc can see), that these cases would be
> > > resolved, in the sense that gcc can not undermine [R & Marked] ; gcc-dep
> > > where gcc-dep is the dependencies detected by gcc.
> >
> > That seems circular. Basically, you're saying the gcc will not break
> > any dependencies that gcc classifies as not-breakable!
>
> Maybe my formulation is not exactly what I meant to express.
> I am thinking of examples like this,
>
> r1 = READ_ONCE(*x);
> *b = r1;
>
> ~~>
>
> r1 = READ_ONCE(*x);
> if (*b != r1) {
> *b = r1;
> }
>
> Here there is clearly a dependency to a store, but gcc might turn it into an
> independent load (in case *b == r1).
>
> Just because gcc admits that there is a dependency, does not necessarily
> mean that it will not still undermine the ordering "bestowed upon" that
> dependency by a memory model in some creative way.
Yes; that's why the LKMM considers only ordering involving marked
accesses, unless absolutely necessary.
> The cases that I could think of all still worked for very specific
> architecture-specific reasons (e.g., x86 has CMOV but all loads provide
> acquire-ordering, and arm does not have flag-conditional str, etc.)
>
> Or perhaps there is no dependency in case *b == r1. I am not sure.
>
> Another thought that pops up here is that when I last worked on formalizing
> dependencies, I could not define dependencies as being between one load and
> one store, a dependency might be between a set of loads and one store. I
> would have to look up the exact reason, but I think it was because sometimes
> you need to change more than one value to influence the result, e.g., a && b
> where both a and b are 0 - just changing one will not make a difference.
Paul and I mentioned exactly this issue in our C++ presentation. It's
in the paper; I had to reformulate the definition of OOTA to take it
into account.
> All of these complications make me wonder whether even a relational notion
> of semantic dependency is good enough.
For handling OOTA, probably not.
> > > I am genuinely asking. Do we have a list of the limitations?
> > > Maybe it would be good to collect it in the "A WARNING" section of
> > > explanation.txt if it doesn't exist elsewhere.
> >
> > There are a few listed already at various spots in explanation.txt --
> > search for "undermine". And yes, many or most of these limitations do
> > arise from LKMM's failure to recognize when a dependency isn't semantic.
> > Maybe some are also related to undefined behavior, which LKMM is not
> > aware of.
>
> Thanks. Another point mentioned in that document is the total order of po,
> whereas C has a more relaxed notion of sequenced-before; this could even
> affect volatile accesses, e.g. in f(READ_ONCE(*a), g()). where g() calls an
> rmb and then READ_ONCE(*b), and it is not clear whether there should be a
> ppo from reading *a to *b in some executions or not.
Hah, yes. Of course, this isn't a problem for the litmus tests that
herd7 will accept, because herd7 doesn't understand user-defined
functions.
Even so, we can run across odd things like this:
r1 = READ_ONCE(*x);
if (r1)
smp_mb();
WRITE_ONCE(*y, 1);
Here the WRITE_ONCE has to be ordered after the READ_ONCE, even when *x
is 0, although the memory model isn't aware of that fact.
> For now I am mostly worried about cases where LKMM promises too much, rather
> than too little. The latter case arises naturally as a trade-off between
> complexity of the model, algorithmic complexity, and what guarantees are
> actually needed from the model by real code.
Agreed, promising too much is a worse problem than promising too little.
> By the way, I am currently investigating a formulation of LKMM where there
> is a separate "propagation order" per thread, prop-order[t], which relates
> `a` to `b` events iff `a` is "observed" to propagate to t before `b`.
I guess this would depend on what you mean by "observed". Are there any
useful cases of this where b isn't executed by t? All I can think of is
that when the right sort of memory barrier separates a from b, then for
all t, a propagates to t before b does -- and this does not depend on t.
> Observation can also include co and fr, not just rf, which might be
> sufficient to cover those cases. I have a hand-written proof sketch that an
> order ORD induced by prop-order[t], xb, and some other orders is acyclic,
> and a Coq proof that executing an operational model in any linearization of
> ORD is permitted (e.g., does not propagate a store before it is executed, or
> in violation of co) and has the same rf as the axiomatic execution.
>
> So if the proof sketch works out, this might indicate that with such a
> per-thread propagation order, one can eliminate those cases.
>
> But to make the definition work, I had to make xb use prop-order[t] instead
> of prop in some cases, and the definitions of xb and prop-order[t] are
> mutually recursive, so it's not a very digestible definition.
These mutually recursive definitions are the bane of memory models. In
fact, that's what drove Will Deacon to impose other-multicopy atomicity
on the ARM64 memory model -- doing so allowed him to avoid mutual
recursion.
> I think there are two relevant claims here:
> 1) compilers do not introduce "observed OOTA"
> 2) For every execution graph with not-observed OOTA, there is another
> execution with the same observable side effects that does not have OOTA.
>
> While it may be much easier to prove 2) with a more relaxed notion of
> observed OOTA, 1) sounds much harder. How does the compiler know that there
> is no indirect way to observe the OOTA?
>
> E.g., in the LKMM example, ignoring the compiler barriers, it might be
> possible for a compiler to deduce that the plain accesses are never used and
> delete them, resulting in an OOTA that is observed under the more relaxed
> setting, violating claim 1).
A main point of the paper Paul and I wrote for the C++ working group was
that the _only_ way OOTA can occur in real-world programs is if all the
accesses have been removed by the compiler (assuming the compiler obeys
some minimal restrictions).
We did not consider the issue of whether such instances of OOTA could be
considered to be "observed". As I understand it, they would not satisfy
your proposed notion of "observed".
Alan
Powered by blists - more mailing lists