[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <a2057858-fd7c-4458-96fc-0c3f52eae5fb@huaweicloud.com>
Date: Wed, 22 Jan 2025 04:46:03 +0100
From: Jonas Oberhauser <jonas.oberhauser@...weicloud.com>
To: Alan Stern <stern@...land.harvard.edu>
Cc: paulmck@...nel.org, parri.andrea@...il.com, will@...nel.org,
peterz@...radead.org, boqun.feng@...il.com, npiggin@...il.com,
dhowells@...hat.com, j.alglave@....ac.uk, luc.maranget@...ia.fr,
akiyks@...il.com, dlustig@...dia.com, joel@...lfernandes.org,
urezki@...il.com, quic_neeraju@...cinc.com, frederic@...nel.org,
linux-kernel@...r.kernel.org, lkmm@...ts.linux.dev,
hernan.poncedeleon@...weicloud.com
Subject: Re: [RFC] tools/memory-model: Rule out OOTA
Am 1/21/2025 um 5:39 PM schrieb Alan Stern:
> On Tue, Jan 21, 2025 at 11:36:01AM +0100, Jonas Oberhauser wrote:
>>> What makes you think this "more appropriate" definition of semantic
>>> dependency will be any easier to detect than the original one?
>>
>> For starters, as you mentioned, the compiler has to assume any value is
>> possible.
>>
>> Which means that if any other value would lead to a "different behavior",
>> you know you have a semantic dependency. You can detect this in a per-thread
>> manner, independently of the compiler.
>
> How? The question is not as simple as it sounds. What counts as
> "different behavior"? What if some loads take place with the other
> value that didn't take place with the original value? What if
> completely different code ends up executing but it stores the same
> values as before?
I agree that it is not easy, mostly because there's no good way to say
whether two stores from two executions are "the same".
Besides the complications mentioned in your paper (e.g., another store
to the same address being on the new code path), I would also need to
think about barriers (e.g., a relaxed store in one execution, but a sc
one in another execution), other kinds of synchronization (what if there
are other atomic accesses in between that establish synchronization?).
One could take your definition (with the requirements about H dropped),
and perhaps since in this specific example all atomic accesses are
volatile, the restriction about counting accesses does not sting as much.
>
>> Without the assumption of volatile, even a different value that could
>> actually be generated in another run of the same program does not prove a
>> semantic dependency.
>
> So then how do you tell whether there is a semantic dependency?
>
>>>> Yes. Actually I wonder if you put this "all loads are volatile" restriction,
>>>> can a globally analysing compiler still have any optimizations that a
>>>> locally analysing compiler can not?
>>>
>>> Yes, although whether they are pertinent is open to question. For
>>> example, a globally optimizing compiler may observe that no thread ever
>>> reads the value of a particular shared variable and then eliminate that
>>> variable.
>>
>> Oh, I meant the "all atomic objects is volatile" restriction, not just the
>> loads. In that case, all stores the object - even if never read - still need
>> to be generated.
>>
>> Are there still any optimizations?
>
> Perhaps not any that affect shared variable accesses. In a way, that
> was the intention.
Yes, but it becomes a bit strange then to treat the "globally analyzing
compiler" as a harder problem. You have made a globally analyzing
compiler that can not globally analyze. I understand that makes the
argument feasible, but I am not sure if this is how compilers really
behave (or at least will still behave in the future).
>> Well, I certainly have not put as much deep thought into it as you have, so
>> it is certainly more fragile. But my current thoughts are along these lines:
>>
>> We consider inter-thread semantic dependencies (isdep) based on the set of
>> allowed executions + thread local optimizations + what would be allowed to
>> happen if rfe edges along the dependencies become rfi edges due to merging.
>> So the definition is not compiler-specific.
>
> How do you know what thread-local optimizations may be applied?
> Different compilers use different ones, and new ones are constantly
> being invented.
I'm not sure it is necessary to know specific optimizations.
I don't have a satisfactory answer now.
But perhaps it is possible to define this through the abstract machine,
maybe with a detour through the set of allowed C realizations.
Something like "given some execution G, and a subset of threads (T)i,
and a read r and write w in those executions, if for every C realization
of threads (T)i together such that all executions of the realization
under all rf relations that exist in some original execution G' modulo
some po-preserving mapping have the same observable side effects as that
original execution G', there is a sequence of syntactic dependency + rf
from r to w, then there is an inter-thread semantic dependency from r to w".
But this makes it really hard to prove that there is a dependency, since
it quantifies over all C programs. And it only takes into account
C-level optimizations, not optimizations specific to some hardware
platform (which for all we know may have some transactional memory or
powerful write speculation which the compiler knows about and uses in
its optimizations).
I am not sure there is a better definition.
> Why not consider global optimizations? Yes, they are the same as
> thread-local optimizations if all the threads have been merged into one,
> but what if they haven't been merged?
Some global optimizations are considered by the fact that we only
consider the set of allowed executions.
So if the compiler can see that in all executions the value of some
variable is always 0 or 1 - for example, because those are the only
kinds of stores to that variable - it might use that to do local
optimizations. Such as eliminating some switch cases which then lead to
eliminating control dependencies.
> For that matter, why bring thread merging into the discussion?
For one, it sounds impractical to make a compiler that does advanced
global optimizations without merging the threads. It's probably easy
enough to do on toy examples, but for realistic applications it seems
completely infeasible, and additionally, hard to gain much benefit from it.
So I'm not sure it is necessary to solve the more advanced problem.
For another, thread merging defeats per-thread semantic dependencies.
So unless all optimizations related to that are ruled out (e.g., by
saying all accesses are volatile), it needs to be considered either
specifically or by a more powerful simpler abstraction. Which I don't have.
>
> I don't want to think about thread merging, and at first glance it
> looks like you don't want to think about anything else.
I *want* to think about other practical global optimizations that are
not included in the optimizations due to knowing the set of possible
executions.
I just haven't been able to.
>>>>> I do not like the idea of tying the definition of OOTA (which needs to
>>>>> apply to every implementation) to a particular clang compiler.
>>>>
>>>> But that is what you have done, no? Whether something is an sdep depends on
>>>> the compiler, so compiler A could generate an execution that is OOTA in the
>>>> sdep definition of compiler B.
>>>
>>> Yes, but this does not tie the definition to _one particular_ compiler.
>>> That is, we don't say "This dependency is semantic because of the way
>>> GCC 14.2.1 handles it." Rather, we define for each compiler whether a
>>> dependency is semantic for _that_ compiler.
>>
>> Yes, but the result is still that every compiler has its own memory model
>> (at least at the point of which graphs are being ruled out as OOTA). So
>> there is no definition of 'G is OOTA', only 'G is OOTA on compiler x'.
>
> Our definition of OOTA and semantic dependency does not apply to graphs;
> it applies to particular hardware executions (together with the abstract
> executions that they realize).
True, but it immediately induces a predicate over graphs indexed by the
compiler (by looking at hardware executions generated by x and the
graphs from the realized abstract executions).
>
> Besides, it is still possible to use our definition to talk about
> semantic dependency in a compiler-independent way. Namely, if a
> dependency is semantic relative to _every_ compiler then we can say it
> is absolutely semantic.
Sure, but that definition is useless.
For example, in the running OOTA example x=y||y=x, there are no
absolutely semantic dependencies. One compiler can turn this into x=y,
where there is no dependency from y to x, and another into y=x, where
there is no dependency from x to y. In fact even a naive non-optimizing
compiler has no dependencies here under your definition because there
are no hardware executions with other values.
In my informal definition, there is (intended to be) a inter-thread
dependency from the load from x in T2 to the store to x in T1, and I
would claim there is no execution in which that load can read from that
store. I would not claim anything about the order of the load from y and
the store to x.
>> The example I gave of the tool verifying the program relative to one
>> specific compiler is also not giving a definition of 'G is OOTA', in fact,
>> it does not specify OOTA at all; it simply says ``with compiler x, we get
>> the following "semantic dependencies" and the following graphs...''
>>
>> And as long as compiler x does not generate OOTA, there will be no OOTA
>> graphs among those.
>>
>> So it does not solve or tackle the theoretical problem in any way, and can
>> not do cross-compiler verification. But it will already apply the
>> 'naive-dependency-breaking optimizations' that you would not see in e.g.
>> herd7.
>
> Okay, fine. But we're trying to come up with general characterizations,
> and it appears that you're doing something quite different.
Well, it wasn't me who did this, it was Hernan's group who did this for
other practical concerns, but yes I agree.
It's not a solution to the very hard problem you're trying to solve, but
a solution to perhaps the more immediate problem people have while
looking at real code.
Best wishes,
jonas
Powered by blists - more mailing lists