linux-kernel - Re: [RFC] tools/memory-model: Rule out OOTA

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1668d0b4-4060-4eac-b8b1-46ca0d1dbce9@rowland.harvard.edu>
Date: Thu, 16 Jan 2025 18:02:18 -0500
From: Alan Stern <stern@...land.harvard.edu>
To: Jonas Oberhauser <jonas.oberhauser@...weicloud.com>
Cc: paulmck@...nel.org, parri.andrea@...il.com, will@...nel.org,
	peterz@...radead.org, boqun.feng@...il.com, npiggin@...il.com,
	dhowells@...hat.com, j.alglave@....ac.uk, luc.maranget@...ia.fr,
	akiyks@...il.com, dlustig@...dia.com, joel@...lfernandes.org,
	urezki@...il.com, quic_neeraju@...cinc.com, frederic@...nel.org,
	linux-kernel@...r.kernel.org, lkmm@...ts.linux.dev,
	hernan.poncedeleon@...weicloud.com
Subject: Re: [RFC] tools/memory-model: Rule out OOTA

On Thu, Jan 16, 2025 at 08:08:22PM +0100, Jonas Oberhauser wrote:
> I'm well aware that an absolute definition of semantic dependencies is not
> easy to give.

In fact it's undecidable.  No tool is ever going to be able to detect 
semantic dependencies perfectly.

> Btw, I'm re-reading that paper and here are some comments:
> - I agree with the conclusion that C++ should not try to solve OOTA other
> than declaring that they do not exist
> - the definition of OOTA as sdep | rfe is not the right one. You really need
> sdep ; rfe, because you can have multiple subsequent sdep links that
> together are not a dependency, e.g.,
> 
> int * x[] = { &a, &a };
> int i = b;
> *x[i] = 1;
> 
> here the semantic address dependency from loading b to loading x[i] and the
> semantic address dependency from loading x[i] and storing to *x[i] do not
> together form a semantic dependency anymore, because *x[i] is always a. So
> this whole code can just become a=1, and with mirrored code you can get
> a=b=1, which is an sdep | rfe cycle.

We regard sdep as extending from loads (or sets of loads) to stores.  
(Perhaps the paper doesn't state this explicitly -- it should.)  So an 
sdep ; sdep sequence is not possible.

Nevertheless, it's arguable that in your example there is no semantic 
dependency from the load of b to the load of x[i].  Given the code 
shown, a compiler could replace the load of x[i] with the constant value 
&a, yielding simply *(&a) = 1.

There are other examples which do make the point, however.  For example:

int a = 1, b = 1, c = 2;
int *x[] = {&a, &b, &c};

int r1 = i;
if (r1 > 1)
	r1 = 1;
int *r2 = x[r1];
int r3 = *r2;
q = r3;

Here there is a semantic dependency (if you accept dependencies to 
loads) from the load of i to the load of x[r1] and from the load of *r2 
to the store to q.  But overall the value stored to q is always 1, so it 
doesn't depend on the load from i.

> - I did not like the phrasing that a dependency is a function of one
> execution, especially in contrast to source code. For a fixed execution,
> there are no dependencies because all values are fixed. It is the other
> executions - where you could have read another value - that create the
> dependencies.
> 
> Perhaps it is better to say that it is not a *local* function of source
> code, i.e., just because the same source code has a dependency in context C
> does not mean that it has a dependency in context C'.

Of course, what we meant is that whether or not there is a semantic 
dependency depends on both the source code and the execution.

> In fact I would say a major gotcha of dependencies is that they are not a
> function of even the set of all permissible (according to the standard)
> executions.
> That is because the compiler does not have to make every permissible
> execution possible, only at least one.

The paper includes examples of this, I believe.

> If the compiler knows that among all executions that it actually makes
> possible, some value is always 0 - even if there is a permissible execution
> in which it is not 0 - it can still replace that value with 0. E.g.
> 
>   T1 { x = 1; x = 0; }
>   T2 { T[x] = 1; }
> 
>    ~~ merge -- makes executions where T[x] reads x==1 impossible ~>
> 
>   T1' { x = 1; x = 0; T[x] = 1; }
> 
>    ~~ replace ~>
> 
>   T1' { x = 1; x = 0; T[0] = 1; // no more dependency :( }
> 
> which defeats any semantic dependency definition that is a function of a
> single execution.

Our wording could be improved.  We simply want to make the point that 
the same source code may have a semantic dependency in one execution but 
not in another.  I guess we should avoid calling it a "function" of the 
execution, to avoid implying that the execution alone is what matters.

> Of course you avoid this by making it really a function of the compiler +
> program *and* the execution, and looking at all the other possible (not just
> permissible) executions.
> 
> - On the one hand, that definition makes a lot of sense. On the other hand,
> at least without the atomics=volatile restriction it would have the downside
> that a compiler which generates just a single execution for your program can
> say that there are no dependencies whatsoever and generate all kinds of "out
> of thin air" behaviors.

That is so.  But a compiler which examines only a single thread at a 
time cannot afford to generate just a single execution, because it 
cannot know what values the loads will obtain when the full program 
runs.

> I am not sure if that gets really resolved by the volatile restrictions you
> put, but either way those seem far stronger than what one would want.

It does get resolved, because treating atomics as volatile also means 
that the compiler cannot afford to generate just a single execution.  
Again, because it cannot know what values the loads will obtain at 
runtime, since volatile loads can yield any value in a non-benign 
environment.

> I would say that the approach with volatile is overzealous because it tries
> to create a "local" order solution to the problem that only requires a
> "global" ordering solution. Since not every semantic dependency needs to
> provide order in C++ -- only the cycle of dependencies -- it is totally ok
> to add too many semantic dependency edges to a program, even those that are
> not going to be exactly maintained by every compiler, as long as we can
> ensure that globally, no dependency cycle occurs.

But then how would you characterize semantic dependencies, if you want 
to allow the definition to include some dependencies that aren't 
semantic but not so many that you ever create a cycle?  This sounds like 
an even worse problem than we started with!

> So for example, if we merge  x = y || y = x,  the merge can turn it into
> x=y=x or y=x=y (and then into an empty program), but not into a cyclic
> dependency. So even though one side of the dependency may be violated, for
> sake of OOTA, we could still label both sides as dependent.

They _are_ both semantically dependent (in the original parallel 
version, I mean).  I don't see what merging has to do with it.

> For LKMM the problem is of course much easier because you have volatiles and
> compiler barriers. Again you could maybe add incorrect semantic dependencies
> between accesses, as long as only the really preserved ones will imply
> ordering.

I'm not particularly concerned about OOTA or semantic dependencies in 
LKMM.

> So I'm not convinced that for either of the two cases you need to do a
> compiler-specific definition of dependencies.

For the C++ case, I cannot think of any other way to approach the 
problem.  Nor do I see anything wrong with a compiler-specific 
definition, given how nebulous the whole idea is in the first place.

> BTW, for what it's worth, Dat3M in a sense uses the clang dependencies - it
> first allows the compiler to do its optimizations, and then verifies the
> llvm-ir (with a more hardware-like dependency definition).

What do you mean by "verifies"?  What property of the llvm-ir does it 
verify?

> I think something like that can be a good practical solution with fewer
> problems than other attempts to approximate the solution.

I do not like the idea of tying the definition of OOTA (which needs to 
apply to every implementation) to a particular clang compiler.

Alan