linux-kernel - Re: [RFC] tools/memory-model: Rule out OOTA

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a2057858-fd7c-4458-96fc-0c3f52eae5fb@huaweicloud.com>
Date: Wed, 22 Jan 2025 04:46:03 +0100
From: Jonas Oberhauser <jonas.oberhauser@...weicloud.com>
To: Alan Stern <stern@...land.harvard.edu>
Cc: paulmck@...nel.org, parri.andrea@...il.com, will@...nel.org,
 peterz@...radead.org, boqun.feng@...il.com, npiggin@...il.com,
 dhowells@...hat.com, j.alglave@....ac.uk, luc.maranget@...ia.fr,
 akiyks@...il.com, dlustig@...dia.com, joel@...lfernandes.org,
 urezki@...il.com, quic_neeraju@...cinc.com, frederic@...nel.org,
 linux-kernel@...r.kernel.org, lkmm@...ts.linux.dev,
 hernan.poncedeleon@...weicloud.com
Subject: Re: [RFC] tools/memory-model: Rule out OOTA



Am 1/21/2025 um 5:39 PM schrieb Alan Stern:
> On Tue, Jan 21, 2025 at 11:36:01AM +0100, Jonas Oberhauser wrote:
>>> What makes you think this "more appropriate" definition of semantic
>>> dependency will be any easier to detect than the original one?
>>
>> For starters, as you mentioned, the compiler has to assume any value is
>> possible.
>>
>> Which means that if any other value would lead to a "different behavior",
>> you know you have a semantic dependency. You can detect this in a per-thread
>> manner, independently of the compiler.
> 
> How?  The question is not as simple as it sounds.  What counts as
> "different behavior"?  What if some loads take place with the other
> value that didn't take place with the original value?  What if
> completely different code ends up executing but it stores the same
> values as before?

I agree that it is not easy, mostly because there's no good way to say 
whether two stores from two executions are "the same".

Besides the complications mentioned in your paper (e.g., another store 
to the same address being on the new code path), I would also need to 
think about barriers (e.g., a relaxed store in one execution, but a sc 
one in another execution), other kinds of synchronization (what if there 
are other atomic accesses in between that establish synchronization?).

One could take your definition (with the requirements about H dropped), 
and perhaps since in this specific example all atomic accesses are 
volatile, the restriction about counting accesses does not sting as much.

> 
>> Without the assumption of volatile, even a different value that could
>> actually be generated in another run of the same program does not prove a
>> semantic dependency.
> 
> So then how do you tell whether there is a semantic dependency?
> 
>>>> Yes. Actually I wonder if you put this "all loads are volatile" restriction,
>>>> can a globally analysing compiler still have any optimizations that a
>>>> locally analysing compiler can not?
>>>
>>> Yes, although whether they are pertinent is open to question.  For
>>> example, a globally optimizing compiler may observe that no thread ever
>>> reads the value of a particular shared variable and then eliminate that
>>> variable.
>>
>> Oh, I meant the "all atomic objects is volatile" restriction, not just the
>> loads. In that case, all stores the object - even if never read - still need
>> to be generated.
>>
>> Are there still any optimizations?
> 
> Perhaps not any that affect shared variable accesses.  In a way, that
> was the intention.

Yes, but it becomes a bit strange then to treat the "globally analyzing 
compiler" as a harder problem. You have made a globally analyzing 
compiler that can not globally analyze. I understand that makes the 
argument feasible, but I am not sure if this is how compilers really 
behave (or at least will still behave in the future).

>> Well, I certainly have not put as much deep thought into it as you have, so
>> it is certainly more fragile. But my current thoughts are along these lines:
>>
>> We consider inter-thread semantic dependencies (isdep) based on the set of
>> allowed executions + thread local optimizations + what would be allowed to
>> happen if rfe edges along the dependencies become rfi edges due to merging.
>> So the definition is not compiler-specific.
> 
> How do you know what thread-local optimizations may be applied?
> Different compilers use different ones, and new ones are constantly
> being invented.

I'm not sure it is necessary to know specific optimizations.
I don't have a satisfactory answer now.
But perhaps it is possible to define this through the abstract machine, 
maybe with a detour through the set of allowed C realizations.
Something like "given some execution G, and a subset of threads (T)i, 
and a read r and write w in those executions, if for every C realization 
of threads (T)i together such that all executions of the realization 
under all rf relations that exist in some original execution G' modulo 
some po-preserving mapping have the same observable side effects as that 
original execution G', there is a sequence of syntactic dependency + rf 
from r to w, then there is an inter-thread semantic dependency from r to w".

But this makes it really hard to prove that there is a dependency, since 
it quantifies over all C programs. And it only takes into account 
C-level optimizations, not optimizations specific to some hardware 
platform (which for all we know may have some transactional memory or 
powerful write speculation which the compiler knows about and uses in 
its optimizations).

I am not sure there is a better definition.


> Why not consider global optimizations? Yes, they are the same as
> thread-local optimizations if all the threads have been merged into one,
> but what if they haven't been merged?

Some global optimizations are considered by the fact that we only 
consider the set of allowed executions.

So if the compiler can see that in all executions the value of some 
variable is always 0 or 1 - for example, because those are the only 
kinds of stores to that variable - it might use that to do local 
optimizations. Such as eliminating some switch cases which then lead to 
eliminating control dependencies.


> For that matter, why bring thread merging into the discussion?

For one, it sounds impractical to make a compiler that does advanced 
global optimizations without merging the threads. It's probably easy 
enough to do on toy examples, but for realistic applications it seems 
completely infeasible, and additionally, hard to gain much benefit from it.

So I'm not sure it is necessary to solve the more advanced problem.

For another, thread merging defeats per-thread semantic dependencies.
So unless all optimizations related to that are ruled out (e.g., by 
saying all accesses are volatile), it needs to be considered either 
specifically or by a more powerful simpler abstraction. Which I don't have.

> 
> I don't want to think about thread merging, and at first glance it
> looks like you don't want to think about anything else.

I *want* to think about other practical global optimizations that are 
not included in the optimizations due to knowing the set of possible 
executions.

I just haven't been able to.

>>>>> I do not like the idea of tying the definition of OOTA (which needs to
>>>>> apply to every implementation) to a particular clang compiler.
>>>>
>>>> But that is what you have done, no? Whether something is an sdep depends on
>>>> the compiler, so compiler A could generate an execution that is OOTA in the
>>>> sdep definition of compiler B.
>>>
>>> Yes, but this does not tie the definition to _one particular_ compiler.
>>> That is, we don't say "This dependency is semantic because of the way
>>> GCC 14.2.1 handles it."  Rather, we define for each compiler whether a
>>> dependency is semantic for _that_ compiler.
>>
>> Yes, but the result is still that every compiler has its own memory model
>> (at least at the point of which graphs are being ruled out as OOTA). So
>> there is no definition of 'G is OOTA', only 'G is OOTA on compiler x'.
> 
> Our definition of OOTA and semantic dependency does not apply to graphs;
> it applies to particular hardware executions (together with the abstract
> executions that they realize).

True, but it immediately induces a predicate over graphs indexed by the 
compiler (by looking at hardware executions generated by x and the 
graphs from the realized abstract executions).

> 
> Besides, it is still possible to use our definition to talk about
> semantic dependency in a compiler-independent way.  Namely, if a
> dependency is semantic relative to _every_ compiler then we can say it
> is absolutely semantic.

Sure, but that definition is useless.

For example, in the running OOTA example x=y||y=x, there are no 
absolutely semantic dependencies. One compiler can turn this into x=y, 
where there is no dependency from y to x, and another into y=x, where 
there is no dependency from x to y. In fact even a naive non-optimizing 
compiler has no dependencies here under your definition because there 
are no hardware executions with other values.

In my informal definition, there is (intended to be) a inter-thread 
dependency from the load from x in T2 to the store to x in T1, and I 
would claim there is no execution in which that load can read from that 
store. I would not claim anything about the order of the load from y and 
the store to x.

>> The example I gave of the tool verifying the program relative to one
>> specific compiler is also not giving a definition of 'G is OOTA', in fact,
>> it does not specify OOTA at all; it simply says ``with compiler x, we get
>> the following "semantic dependencies" and the following graphs...''
>>
>> And as long as compiler x does not generate OOTA, there will be no OOTA
>> graphs among those.
>>
>> So it does not solve or tackle the theoretical problem in any way, and can
>> not do cross-compiler verification. But it will already apply the
>> 'naive-dependency-breaking optimizations' that you would not see in e.g.
>> herd7.
> 
> Okay, fine.  But we're trying to come up with general characterizations,
> and it appears that you're doing something quite different.

Well, it wasn't me who did this, it was Hernan's group who did this for 
other practical concerns, but yes I agree.
It's not a solution to the very hard problem you're trying to solve, but 
a solution to perhaps the more immediate problem people have while 
looking at real code.

Best wishes,
   jonas