[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <df9de992-fd3d-432b-9758-9fe35cc9b0c4@rowland.harvard.edu>
Date: Tue, 21 Jan 2025 11:39:00 -0500
From: Alan Stern <stern@...land.harvard.edu>
To: Jonas Oberhauser <jonas.oberhauser@...weicloud.com>
Cc: paulmck@...nel.org, parri.andrea@...il.com, will@...nel.org,
peterz@...radead.org, boqun.feng@...il.com, npiggin@...il.com,
dhowells@...hat.com, j.alglave@....ac.uk, luc.maranget@...ia.fr,
akiyks@...il.com, dlustig@...dia.com, joel@...lfernandes.org,
urezki@...il.com, quic_neeraju@...cinc.com, frederic@...nel.org,
linux-kernel@...r.kernel.org, lkmm@...ts.linux.dev,
hernan.poncedeleon@...weicloud.com
Subject: Re: [RFC] tools/memory-model: Rule out OOTA
On Tue, Jan 21, 2025 at 11:36:01AM +0100, Jonas Oberhauser wrote:
> > What makes you think this "more appropriate" definition of semantic
> > dependency will be any easier to detect than the original one?
>
> For starters, as you mentioned, the compiler has to assume any value is
> possible.
>
> Which means that if any other value would lead to a "different behavior",
> you know you have a semantic dependency. You can detect this in a per-thread
> manner, independently of the compiler.
How? The question is not as simple as it sounds. What counts as
"different behavior"? What if some loads take place with the other
value that didn't take place with the original value? What if
completely different code ends up executing but it stores the same
values as before?
> Without the assumption of volatile, even a different value that could
> actually be generated in another run of the same program does not prove a
> semantic dependency.
So then how do you tell whether there is a semantic dependency?
> > > Yes. Actually I wonder if you put this "all loads are volatile" restriction,
> > > can a globally analysing compiler still have any optimizations that a
> > > locally analysing compiler can not?
> >
> > Yes, although whether they are pertinent is open to question. For
> > example, a globally optimizing compiler may observe that no thread ever
> > reads the value of a particular shared variable and then eliminate that
> > variable.
>
> Oh, I meant the "all atomic objects is volatile" restriction, not just the
> loads. In that case, all stores the object - even if never read - still need
> to be generated.
>
> Are there still any optimizations?
Perhaps not any that affect shared variable accesses. In a way, that
was the intention.
> Well, I certainly have not put as much deep thought into it as you have, so
> it is certainly more fragile. But my current thoughts are along these lines:
>
> We consider inter-thread semantic dependencies (isdep) based on the set of
> allowed executions + thread local optimizations + what would be allowed to
> happen if rfe edges along the dependencies become rfi edges due to merging.
> So the definition is not compiler-specific.
How do you know what thread-local optimizations may be applied?
Different compilers use different ones, and new ones are constantly
being invented.
Why not consider global optimizations? Yes, they are the same as
thread-local optimizations if all the threads have been merged into one,
but what if they haven't been merged?
For that matter, why bring thread merging into the discussion?
> Those provide order at machine level unless the compiler restricts the set
> of executions through its choices, especially cross-thread optimizations,
> and then uses the restricted set of executions (i.e., possible input range)
> to optimize the execution further.
> I haven't thought deeply about all the different optimizations that are
> possible there.
>
> For an example of how it may not provide order, if we have accesses related
> by isdep;rfe;isdep, then the compiler could merge all the involved threads
> into one, and the accesses could no longer have any dependency.
>
> So it is important that one can not forbid isdep;rf cycles, the axiom would
> be that isdep;rf is irreflexive.
>
>
> When merging threads, if in an OOTA execution there is an inter-thread
> semantic dependency from r in one of the merged threads to w in one of the
> merged threads, such that r reads from a non-merged thread and w is read by
> a non-merged thread in the OOTA cycle, then it is still an inter-thread
> dependency which still preserves the order. This prevents the full cycle
> even if it some other sub-edges within the merged threads are now no longer
> dependencies.
>
> But if r is reading from w in the OOTA cycle, then the compiler (which is
> not allowed to look for the cycle) has to put r before w in the merged po,
> preventing r from reading w. (it can also omit r by giving it a value that
> it could legally read in this po, but it still can't get its value from w
> this way).
>
> E.g.,
> P0 {
> y = x;
> }
> P1 {
> z = y;
> }
> ... (some way to assign x dependent on z)
>
> would have an inter-thread dependency from the load of x to the store to z.
> If P0 and P1 are merged, and there are no other accesses to y, then the
> intra-thread dependency from loading x to store to y etc. are eliminated,
> but the inter-thread dependency from x to z remains.
> > I don't understand.
>
> Does the explanation above help?
No. I don't want to think about thread merging, and at first glance it
looks like you don't want to think about anything else.
> > > > I do not like the idea of tying the definition of OOTA (which needs to
> > > > apply to every implementation) to a particular clang compiler.
> > >
> > > But that is what you have done, no? Whether something is an sdep depends on
> > > the compiler, so compiler A could generate an execution that is OOTA in the
> > > sdep definition of compiler B.
> >
> > Yes, but this does not tie the definition to _one particular_ compiler.
> > That is, we don't say "This dependency is semantic because of the way
> > GCC 14.2.1 handles it." Rather, we define for each compiler whether a
> > dependency is semantic for _that_ compiler.
>
> Yes, but the result is still that every compiler has its own memory model
> (at least at the point of which graphs are being ruled out as OOTA). So
> there is no definition of 'G is OOTA', only 'G is OOTA on compiler x'.
Our definition of OOTA and semantic dependency does not apply to graphs;
it applies to particular hardware executions (together with the abstract
executions that they realize).
Besides, it is still possible to use our definition to talk about
semantic dependency in a compiler-independent way. Namely, if a
dependency is semantic relative to _every_ compiler then we can say it
is absolutely semantic.
> The example I gave of the tool verifying the program relative to one
> specific compiler is also not giving a definition of 'G is OOTA', in fact,
> it does not specify OOTA at all; it simply says ``with compiler x, we get
> the following "semantic dependencies" and the following graphs...''
>
> And as long as compiler x does not generate OOTA, there will be no OOTA
> graphs among those.
>
> So it does not solve or tackle the theoretical problem in any way, and can
> not do cross-compiler verification. But it will already apply the
> 'naive-dependency-breaking optimizations' that you would not see in e.g.
> herd7.
Okay, fine. But we're trying to come up with general characterizations,
and it appears that you're doing something quite different.
Alan
Powered by blists - more mailing lists