linux-kernel - Re: [RFC] tools/memory-model: Rule out OOTA

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2139f8ce-63fa-4231-864b-500ee2645811@huaweicloud.com>
Date: Tue, 21 Jan 2025 11:36:01 +0100
From: Jonas Oberhauser <jonas.oberhauser@...weicloud.com>
To: Alan Stern <stern@...land.harvard.edu>
Cc: paulmck@...nel.org, parri.andrea@...il.com, will@...nel.org,
 peterz@...radead.org, boqun.feng@...il.com, npiggin@...il.com,
 dhowells@...hat.com, j.alglave@....ac.uk, luc.maranget@...ia.fr,
 akiyks@...il.com, dlustig@...dia.com, joel@...lfernandes.org,
 urezki@...il.com, quic_neeraju@...cinc.com, frederic@...nel.org,
 linux-kernel@...r.kernel.org, lkmm@...ts.linux.dev,
 hernan.poncedeleon@...weicloud.com
Subject: Re: [RFC] tools/memory-model: Rule out OOTA



Am 1/17/2025 um 9:01 PM schrieb Alan Stern:
> On Fri, Jan 17, 2025 at 12:29:34PM +0100, Jonas Oberhauser wrote:
>>
>>
>> Am 1/17/2025 um 12:02 AM schrieb Alan Stern:
>>> On Thu, Jan 16, 2025 at 08:08:22PM +0100, Jonas Oberhauser wrote:
>>>> I'm well aware that an absolute definition of semantic dependencies is not
>>>> easy to give.
>>>
>>> In fact it's undecidable.  No tool is ever going to be able to detect
>>> semantic dependencies perfectly.
>>
>> It depends.
>> Firstly, let's consider that the tool only runs on finite (or "finitized")
>> programs.
> 
> A program may be finite; that doesn't mean its executions are.  Programs
> can go into infinite loops.  Are you saying that the tool would stop
> verifying executions after (say) a billion steps?  But then it would not
> be able to detect semantic dependences in programs that run longer.

Yes, what I said is not fully correct. Finite (non-trivial-)loop-free 
programs, or programs with finite (representative) execution spaces, 
would have been more correct.



> 
>> Seceondly, your definition depends on the compiler.
>>
>> So in the sense that we don't know the compiler, it is undecidable.
>> But if you fix the compiler, you could still enumerate all executions under
>> that compiler and compute whether that compiler has a dependency or not.
> 
> You can't enumerate the executions unless you put an artificial bound on
> the length of each execution, as noted above.

Note that for many practical cases it is possible to write test cases 
where the bound is not artificial in the sense that it is at least as 
large as the bound needed for exhaustive enumeration.


>> But as I mentioned before, I think you can define semantic dependencies more
>> appropriately because you don't really need to preserve semantic
>> dependencies in C++, and in LKMM (and VMM) you have volatiles that restrict
>> what kind of dependency-eliminations the compiler can do.
> 
> What makes you think this "more appropriate" definition of semantic
> dependency will be any easier to detect than the original one?

For starters, as you mentioned, the compiler has to assume any value is 
possible.

Which means that if any other value would lead to a "different 
behavior", you know you have a semantic dependency. You can detect this 
in a per-thread manner, independently of the compiler.

Without the assumption of volatile, even a different value that could 
actually be generated in another run of the same program does not prove 
a semantic dependency.


>> Yes. Actually I wonder if you put this "all loads are volatile" restriction,
>> can a globally analysing compiler still have any optimizations that a
>> locally analysing compiler can not?
> 
> Yes, although whether they are pertinent is open to question.  For
> example, a globally optimizing compiler may observe that no thread ever
> reads the value of a particular shared variable and then eliminate that
> variable.

Oh, I meant the "all atomic objects is volatile" restriction, not just 
the loads. In that case, all stores the object - even if never read - 
still need to be generated.

Are there still any optimizations?

>> It rather seems the other way, that the locally analysing quasi-volatile
>> compiler can at least to some local optimizations, while the global volatile
>> compiler can not (e.g., x=1; y=x; can not be x=1; y=1; for the global
>> compiler because x is volatile now).
> 
> In the paper we speculate that it may be sufficient to require globally
> optimizing compilers to treat atomics as quasi volatile with the added
> restriction that loads must not be omitted.

I see.

>>> But then how would you characterize semantic dependencies, if you want
>>> to allow the definition to include some dependencies that aren't
>>> semantic but not so many that you ever create a cycle?
>>
>> I don't know which definition is correct yet, but the point is that you
>> don't have to avoid making so many dependencies that you would create a
>> cycle. It just forbids the compiler from looking for cycles and optimizing
>> based on the existance of the cycle. (Looking for unused vars and removing
>> those is still allowed, under the informal argument that this simulates an
>> execution where no OOTA happened)
> 
> At the moment, the only underlying ideas we have driving our notions of
> semantic dependency is that a true semantic dependency should be one
> which must give rise to order in the machine code.  In turn, this order
> prevents OOTA cycles from occuring during execution.  That is the
> essence of the paper.
> 
> Can your ideas be fleshed out to a comparable degree?

Well, I certainly have not put as much deep thought into it as you have, 
so it is certainly more fragile. But my current thoughts are along these 
lines:

We consider inter-thread semantic dependencies (isdep) based on the set 
of allowed executions + thread local optimizations + what would be 
allowed to happen if rfe edges along the dependencies become rfi edges 
due to merging. So the definition is not compiler-specific.

Those provide order at machine level unless the compiler restricts the 
set of executions through its choices, especially cross-thread 
optimizations, and then uses the restricted set of executions (i.e., 
possible input range) to optimize the execution further.
I haven't thought deeply about all the different optimizations that are 
possible there.

For an example of how it may not provide order, if we have accesses 
related by isdep;rfe;isdep, then the compiler could merge all the 
involved threads into one, and the accesses could no longer have any 
dependency.

So it is important that one can not forbid isdep;rf cycles, the axiom 
would be that isdep;rf is irreflexive.


When merging threads, if in an OOTA execution there is an inter-thread 
semantic dependency from r in one of the merged threads to w in one of 
the merged threads, such that r reads from a non-merged thread and w is 
read by a non-merged thread in the OOTA cycle, then it is still an 
inter-thread dependency which still preserves the order. This prevents 
the full cycle even if it some other sub-edges within the merged threads 
are now no longer dependencies.

But if r is reading from w in the OOTA cycle, then the compiler (which 
is not allowed to look for the cycle) has to put r before w in the 
merged po, preventing r from reading w. (it can also omit r by giving it 
a value that it could legally read in this po, but it still can't get 
its value from w this way).

E.g.,
P0 {
   y = x;
}
P1 {
   z = y;
}
... (some way to assign x dependent on z)

would have an inter-thread dependency from the load of x to the store to 
z. If P0 and P1 are merged, and there are no other accesses to y, then 
the intra-thread dependency from loading x to store to y etc. are 
eliminated, but the inter-thread dependency from x to z remains.



>>>> So for example, if we merge  x = y || y = x,  the merge can turn it into
>>>> x=y=x or y=x=y (and then into an empty program), but not into a cyclic
>>>> dependency. So even though one side of the dependency may be violated, for
>>>> sake of OOTA, we could still label both sides as dependent.
>>>
>>> They _are_ both semantically dependent (in the original parallel
>>> version, I mean).  I don't see what merging has to do with it.
>>
>> Note that I was considering the case where they are not volatile.
>> With a compiler that is not treating them as volatile, which merges the two
>> threads, under your definition, there is no semantic dependency in at least
>> one direction because there is no hardware realization H where you read
>> something else (of course you exclude such compilers, but I think
>> realistically they should be allowed).
>>
>> My point is that we can say they are semantically dependent for the sake of
>> OOTA, not derive any ordering from these dependencies other than the
>> cyclical one, and therefore allow compilers to one of the two optimizations
>> (make x=y no longer depend on y or make y=x no longer depend on x) but not
>> make a cycle analysis to remove both dependencies and generate an OOTA value
>> (it can remove both dependencies by leaving x and y unchanged though).
> 
> I don't understand.

Does the explanation above help?

> 
>>> I do not like the idea of tying the definition of OOTA (which needs to
>>> apply to every implementation) to a particular clang compiler.
>>
>> But that is what you have done, no? Whether something is an sdep depends on
>> the compiler, so compiler A could generate an execution that is OOTA in the
>> sdep definition of compiler B.
> 
> Yes, but this does not tie the definition to _one particular_ compiler.
> That is, we don't say "This dependency is semantic because of the way
> GCC 14.2.1 handles it."  Rather, we define for each compiler whether a
> dependency is semantic for _that_ compiler.

Yes, but the result is still that every compiler has its own memory 
model (at least at the point of which graphs are being ruled out as 
OOTA). So there is no definition of 'G is OOTA', only 'G is OOTA on 
compiler x'.

The example I gave of the tool verifying the program relative to one 
specific compiler is also not giving a definition of 'G is OOTA', in 
fact, it does not specify OOTA at all; it simply says ``with compiler x, 
we get the following "semantic dependencies" and the following graphs...''

And as long as compiler x does not generate OOTA, there will be no OOTA 
graphs among those.

So it does not solve or tackle the theoretical problem in any way, and 
can not do cross-compiler verification. But it will already apply the 
'naive-dependency-breaking optimizations' that you would not see in e.g. 
herd7.


>> So you either need a definition of semantic dependency that
>> a) applies in all cases we practically need and
>> b) is guaranteed by all compilers
>>
>> or we need to live with the fact that we do not have a semantic dependency
>> definition that is independent of the compilation (even of the same
>> compiler) and need to do our verification for that specific compilation.
> 
> Add the qualification that the definition should be practical to
> evaluate, and I agree.

Yes, an important point. And hard to resolve as well.


Best wishes,
   jonas