linux-kernel - Re: [RFC] tools/memory-model: Rule out OOTA

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e35c40e9-ccc8-43e9-a676-1ea8dd925718@huaweicloud.com>
Date: Fri, 17 Jan 2025 12:29:34 +0100
From: Jonas Oberhauser <jonas.oberhauser@...weicloud.com>
To: Alan Stern <stern@...land.harvard.edu>
Cc: paulmck@...nel.org, parri.andrea@...il.com, will@...nel.org,
 peterz@...radead.org, boqun.feng@...il.com, npiggin@...il.com,
 dhowells@...hat.com, j.alglave@....ac.uk, luc.maranget@...ia.fr,
 akiyks@...il.com, dlustig@...dia.com, joel@...lfernandes.org,
 urezki@...il.com, quic_neeraju@...cinc.com, frederic@...nel.org,
 linux-kernel@...r.kernel.org, lkmm@...ts.linux.dev,
 hernan.poncedeleon@...weicloud.com
Subject: Re: [RFC] tools/memory-model: Rule out OOTA



Am 1/17/2025 um 12:02 AM schrieb Alan Stern:
> On Thu, Jan 16, 2025 at 08:08:22PM +0100, Jonas Oberhauser wrote:
>> I'm well aware that an absolute definition of semantic dependencies is not
>> easy to give.
> 
> In fact it's undecidable.  No tool is ever going to be able to detect
> semantic dependencies perfectly.

It depends.
Firstly, let's consider that the tool only runs on finite (or 
"finitized") programs.
Seceondly, your definition depends on the compiler.

So in the sense that we don't know the compiler, it is undecidable.
But if you fix the compiler, you could still enumerate all executions 
under that compiler and compute whether that compiler has a dependency 
or not.

But as I mentioned before, I think you can define semantic dependencies 
more appropriately because you don't really need to preserve semantic 
dependencies in C++, and in LKMM (and VMM) you have volatiles that 
restrict what kind of dependency-eliminations the compiler can do.

>>
>> int * x[] = { &a, &a };
>> int i = b;
>> *x[i] = 1;
>>
>> here the semantic address dependency from loading b to loading x[i] and the
>> semantic address dependency from loading x[i] and storing to *x[i] do not
>> together form a semantic dependency anymore, because *x[i] is always a. So
>> this whole code can just become a=1, and with mirrored code you can get
>> a=b=1, which is an sdep | rfe cycle.
> 
> We regard sdep as extending from loads (or sets of loads) to stores.
> (Perhaps the paper doesn't state this explicitly -- it should.)  So an
> sdep ; sdep sequence is not possible.

I see. There is a line in the introduction ("Roughly speaking, there is 
a semantic dependency from a given load
to a given store when all other things being equal, a change in the 
value loaded can
change the value stored or prevent the store from occurring at all"), 
but I read it too carelessly and treated it as an implication rather 
than an iff.

I suppose at some point you anyways replace the definition of OOTA with 
a version where only semantic dependencies from loads to a store are 
considered, so the concern becomes irrelevant then.


>> - On the one hand, that definition makes a lot of sense. On the other hand,
>> at least without the atomics=volatile restriction it would have the downside
>> that a compiler which generates just a single execution for your program can
>> say that there are no dependencies whatsoever and generate all kinds of "out
>> of thin air" behaviors.
> 
> That is so.  But a compiler which examines only a single thread at a
> time cannot afford to generate just a single execution, because it
> cannot know what values the loads will obtain when the full program
> runs.
> 
>> I am not sure if that gets really resolved by the volatile restrictions you
>> put, but either way those seem far stronger than what one would want.
> 
> It does get resolved, because treating atomics as volatile also means
> that the compiler cannot afford to generate just a single execution.
> Again, because it cannot know what values the loads will obtain at
> runtime, since volatile loads can yield any value in a non-benign
> environment.


Yes. Actually I wonder if you put this "all loads are volatile" 
restriction, can a globally analysing compiler still have any 
optimizations that a locally analysing compiler can not?

It rather seems the other way, that the locally analysing quasi-volatile 
compiler can at least to some local optimizations, while the global 
volatile compiler can not (e.g., x=1; y=x; can not be x=1; y=1; for the 
global compiler because x is volatile now).

>> I would say that the approach with volatile is overzealous because it tries
>> to create a "local" order solution to the problem that only requires a
>> "global" ordering solution. Since not every semantic dependency needs to
>> provide order in C++ -- only the cycle of dependencies -- it is totally ok
>> to add too many semantic dependency edges to a program, even those that are
>> not going to be exactly maintained by every compiler, as long as we can
>> ensure that globally, no dependency cycle occurs.
> 
> But then how would you characterize semantic dependencies, if you want
> to allow the definition to include some dependencies that aren't
> semantic but not so many that you ever create a cycle? 

I don't know which definition is correct yet, but the point is that you 
don't have to avoid making so many dependencies that you would create a 
cycle. It just forbids the compiler from looking for cycles and 
optimizing based on the existance of the cycle. (Looking for unused vars 
and removing those is still allowed, under the informal argument that 
this simulates an execution where no OOTA happened)


> This sounds like
> an even worse problem than we started with!
 >
 >
>> So for example, if we merge  x = y || y = x,  the merge can turn it into
>> x=y=x or y=x=y (and then into an empty program), but not into a cyclic
>> dependency. So even though one side of the dependency may be violated, for
>> sake of OOTA, we could still label both sides as dependent.
> 
> They _are_ both semantically dependent (in the original parallel
> version, I mean).  I don't see what merging has to do with it.

Note that I was considering the case where they are not volatile.
With a compiler that is not treating them as volatile, which merges the 
two threads, under your definition, there is no semantic dependency in 
at least one direction because there is no hardware realization H where 
you read something else (of course you exclude such compilers, but I 
think realistically they should be allowed).

My point is that we can say they are semantically dependent for the sake 
of OOTA, not derive any ordering from these dependencies other than the 
cyclical one, and therefore allow compilers to one of the two 
optimizations (make x=y no longer depend on y or make y=x no longer 
depend on x) but noth make a cycle analysis to remove both dependencies 
and generate an OOTA value (it can remove both dependencies by leaving x 
and y unchanged though).


>> So I'm not convinced that for either of the two cases you need to do a
>> compiler-specific definition of dependencies.
> 
> For the C++ case, I cannot think of any other way to approach the
> problem.  Nor do I see anything wrong with a compiler-specific
> definition, given how nebulous the whole idea is in the first place.
> 
>> BTW, for what it's worth, Dat3M in a sense uses the clang dependencies - it
>> first allows the compiler to do its optimizations, and then verifies the
>> llvm-ir (with a more hardware-like dependency definition).
> 
> What do you mean by "verifies"?  What property of the llvm-ir does it
> verify?

Verify that the algorithm, e.g., qspinlock, has no executions that are 
buggy, e.g., have a data race on the critical section.

It does so for a fixed test case with a small number of threads, i.e., a 
finite program.
  >> I think something like that can be a good practical solution with fewer
>> problems than other attempts to approximate the solution.
> 
> I do not like the idea of tying the definition of OOTA (which needs to
> apply to every implementation) to a particular clang compiler.

But that is what you have done, no? Whether something is an sdep depends 
on the compiler, so compiler A could generate an execution that is OOTA 
in the sdep definition of compiler B.
(Of course with the assumption of atomic=volatile, it may just be that 
we are back to the beginning and all "naive" semantic dependencies are 
actually semantic dependencies for all compilers).

Anyways what I meant is not about tying the definition of OOTA to one 
compiler or other. As I mentioned I think it can be fine to define OOTA 
in the same way for all compilers.
What I meant is to specialize the memory model to a specific compiler, 
as long as that is the compiler that is used in reality.
So long as your code does not depend on the ordering of any semantic 
dependencies, the verification can be cross platform.

And although...

 > I'm not particularly concerned about OOTA or semantic dependencies in
LKMM.

... there is code that relies on semantic dependencies, e.g. RCU read 
side CS code. (even if we do not care about OOTA).
For that code, the semantic dependencies must be guaranteed to create 
ordering.

So you either need a definition of semantic dependency that
a) applies in all cases we practically need and
b) is guaranteed by all compilers

or we need to live with the fact that we do not have a semantic 
dependency definition that is independent of the compilation (even of 
the same compiler) and need to do our verification for that specific 
compilation.

I think for LKMM we could give such a semantic dependency definition 
because it uses volatile, and verify RCU-read-side code. But we 
currently do not have one. What I meant to say is that using the actual 
(whatever compiler you use) optimizations first to remove syntactic 
dependencies, and then verifying under the assumption of whatever 
dependencies are left, may be better than trying to approximate 
dependencies in some way in cat. Given that we want to verify and rely 
on the code today, not in X years when we all agree on what a 
compiler-independent definition of semantic dependency is.

I think for C++ consume we could also give one by simply restricting 
some compiler optimizations for consume loads (and doing whatever needs 
to be done on alpha). Or just kick it out and not have any dependency 
ordering except the global OOTA case.

Sorry for the confusion, I think there are so many different 
combinations/battlefields (OOTA vs just semantic dependencies, 
volatile/non-volatile atomics, verifying the model vs verifying a piece 
of code etc.) that it becomes hard for me not to confuse myself, let 
alone others :))


Best wishes,
   jonas