linux-kernel - Re: [RFC] tools/memory-model: Rule out OOTA

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3ca45d97-670f-4c9e-8df8-2304a1de32b9@huaweicloud.com>
Date: Wed, 8 Jan 2025 18:33:05 +0100
From: Jonas Oberhauser <jonas.oberhauser@...weicloud.com>
To: Alan Stern <stern@...land.harvard.edu>
Cc: paulmck@...nel.org, parri.andrea@...il.com, will@...nel.org,
 peterz@...radead.org, boqun.feng@...il.com, npiggin@...il.com,
 dhowells@...hat.com, j.alglave@....ac.uk, luc.maranget@...ia.fr,
 akiyks@...il.com, dlustig@...dia.com, joel@...lfernandes.org,
 urezki@...il.com, quic_neeraju@...cinc.com, frederic@...nel.org,
 linux-kernel@...r.kernel.org, lkmm@...ts.linux.dev,
 hernan.poncedeleon@...weicloud.com
Subject: Re: [RFC] tools/memory-model: Rule out OOTA



Am 1/7/2025 um 5:09 PM schrieb Alan Stern:
> On Mon, Jan 06, 2025 at 10:40:03PM +0100, Jonas Oberhauser wrote:
>> The current LKMM allows out-of-thin-air (OOTA), as evidenced in the following
>> example shared on this list a few years ago:
>>
>>    P0(int *a, int *b, int *x, int *y) {
>>    	int r1;
>>
>>    	r1 = READ_ONCE(*x);
>>    	smp_rmb();
>>    	if (r1 == 1) {
>>    		*a = *b;
>>    	}
>>    	smp_wmb();
>>    	WRITE_ONCE(*y, 1);
>>    }
>>
>>    P1(int *a, int *b, int *x, int *y) {
>>    	int r1;
>>
>>    	r1 = READ_ONCE(*y);
>>    	smp_rmb();
>>    	if (r1 == 1) {
>>    		*b = *a;
>>    	}
>>    	smp_wmb();
>>    	WRITE_ONCE(*x, 1);
>>    }
>>
>>    exists b=42
>>
>> The root cause is an interplay between plain accesses and rw-fences, i.e.,
>> smp_rmb() and smp_wmb(): while smp_rmb() and smp_wmb() provide sufficient
>> ordering for plain accesses to rule out data races, they do not in the current
>> formalization generally actually order the plain accesses, allowing, e.g., the
>> load and stores to *b to proceed in any order even if P1 reads from P0; and in
>> particular, the marked accesses around those plain accesses are also not
>> ordered, which causes this OOTA.
> 
> That's right.  The memory model deliberately tries to avoid placing
> restrictions on plain accesses, whenever it can.
> 
> In the example above, for instance, I think it's more interesting to ask
> 
> 	exists 0:r1=1 /\ 1:r1=1
> 
> than to concentrate on a and b.

Yes, and of course there's a relationship between the two anomalies.
My proposed patch solves OOTA indirectly, by forbidding the marked 
accesses to *x and *y from being reordered in case r1 == 1, and thereby, 
forbidding exactly the outcome where both have r1 == 1.


> OOTA is a very difficult subject.

No doubt.

> It can be approached only by making
> the memory model take all sorts of compiler optimizations into account,
> and doing this for all possible optimizations is not feasible.

I think there's two parts of this, one is the correct definition of 
semantic (or compiler-preserved) dependencies. This is a really hard 
problem that can indeed only be solved by looking at all ways the 
compiler can optimize things. I'm not trying to solve that (and can't 
address that issue in a cat file anyways).

The second part is to see which accesses could participate in an OOTA.
I think this is a lot simpler.

> (For example, in a presentation to the C++ working group last year, Paul
> and I didn't try to show how to extend the C++ memory model to exclude
> OOTA [other than by fiat, as it does now].  Instead we argued that with
> the existing memory model, no reasonable compiler would ever produce an
> executable that could exhibit OOTA and so the memory model didn't need
> to be changed.)


I think a model like C++ should exclude it exactly by fiat, by 
formalizing OOTA-freedom as an axiom, like

   acyclic ( (data|ctrl|addr) ; rfe )

This axiom is to some extent unsatisfying because it does not explain 
*how* the OOTA is avoided, and in particular, it does not forbid any 
specific kind of behavior that would lead to OOTA, just the complete 
combination of them.
So for example, the following would be allowed:
P0:
   r0 = y.load_explicit(memory_order_relaxed);
   x.store_explicit(r0,memory_order_relaxed);

P1:
   r1 = x; // reads 1
   atomic_thread_fence();
   y = 1;

which could by some people be interpreted as the accesses of P0 being 
executed out of order (although there is no such concept in C++), 
indicating that if P0's accesses are allowed to be executed out of 
order, then so should P2's:

P2:
   r2 = x.load_explicit(memory_order_relaxed);
   y.store_explicit(r2,memory_order_relaxed);

Of course if both P0 and P2 are "executing out of order" at the same 
time, one would have OOTA, but this "global behavior" would be forbidden 
by the axiom.

But that is already how C++ works: a release access in C++ does not have 
any ordering by itself either. It is just the combination of release + 
acquire on other threads that somehow establishes synchronization.

LKMM works differently though, by providing "local" ordering rules, 
through ppo. We can argue about a ppo locally even without knowing the 
code of any other threads, let alone whether their accesses have acquire 
or release semantics. So we can address OOTA in a "local" manner as well.

And it has the advantage of having compiler barriers around a lot of 
things, which makes reasoning a lot more feasible.



>> In this patch, we choose the rather conservative approach of forcing only the
>> order of these marked accesses, specifically, when a marked read r is
>> separated from a plain read r' by an smp_rmb() (or r' has an address
>> dependency on r or is r'), on which a write w' depends, and w' is either plain
>> and seperated by a subsequent write w by an smp_wmb() (or w' is w), then r
>> precedes w in ppo.
> 
> Is this really valid?  In the example above, if there were no other
> references to a or b in the rest of the program, the compiler could
> eliminate them entirely.

In the example above, this is not possible, because the address of a/b 
have escaped the function and are not deallocated before synchronization 
happens.
Therefore the compiler must assume that a/b are accessed inside the 
compiler barrier.


>  (Whether the result could count as OOTA is
> open to question, but that's not the point.)  Is it not possible that a
> compiler might find other ways to defeat your intentions?

The main point (which I already mentioned in the previous e-mail) is if 
the object is deallocated without synchronization (or never escapes the 
function in the first place).

And indeed, any such case renders the added rule unsound. It is in a 
sense unrelated to OOTA; cases where the load/store can be elided are 
never OOTA.

Of course that is outside the current scope of what herd7 needs to deal 
with or can express, because deallocation isn't a thing in herd7.

Nevertheless, trying to specify inside cat when an access is "live" -- 
relevant enough that the compiler will keep it around -- is hard and 
tedious (it's the main source of complication in the patch).

A much better way would be to add a base set of "live loads and stores" 
Live, which are the loads and stores that the compiler must consider to 
be live. Just like addr, ctrl, etc., we don't have to specify these in 
cat, and rather rely on herd7 to correctly .


If an access interacts with an access of another thread (by reading from 
it or being read from it), it must be live.

Then we could formulate the rule as

+let to-w = (overwrite & int) | (addr | rmb ; [Live])? ; rwdep ; ([Live] 
; wmb)?

(the latter case being a generalization of the current `addr ; [Plain] ; 
wmb` and `rwdep` cases of to-w, assuming we restrict it Life accesses - 
it is otherwise also unsound:

  int a[2] = {0};
  int r1 = READ_ONCE(*x);
  a[r1] = 0; // compiler will just remove this
  smp_wmb();
  WRITE_ONCE(*y, 1);

)

The formulation in the patch is just based on a complicated and close 
but imperfect approximation of Live.

> In any case, my feeling is that memory models for higher languages
> (i.e., anything above the assembler level) should not try very hard to
> address the question of OOTA.  And for LKMM, OOTA involving _plain_
> accesses is doubly out of bounds.
> 
> Your proposed change seems to add a significant complication to the
> memory model for a not-very-clear benefit.

Even if we ignore OOTA for the moment, it is not inconceivable to have 
some cases use a combination of plain accesses with 
dependencies/rmb/wmb, such as in an RCU case.

That's probably the reason for the current addr ; [Plain] ; wmb exists.
It's not clear that it covers all cases though.

Best wishes,
   jonas