linux-kernel - [RFC] tools/memory-model: Rule out OOTA

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250106214003.504664-1-jonas.oberhauser@huaweicloud.com>
Date: Mon,  6 Jan 2025 22:40:03 +0100
From: Jonas Oberhauser <jonas.oberhauser@...weicloud.com>
To: paulmck@...nel.org
Cc: stern@...land.harvard.edu,
	parri.andrea@...il.com,
	will@...nel.org,
	peterz@...radead.org,
	boqun.feng@...il.com,
	npiggin@...il.com,
	dhowells@...hat.com,
	j.alglave@....ac.uk,
	luc.maranget@...ia.fr,
	akiyks@...il.com,
	dlustig@...dia.com,
	joel@...lfernandes.org,
	urezki@...il.com,
	quic_neeraju@...cinc.com,
	frederic@...nel.org,
	linux-kernel@...r.kernel.org,
	lkmm@...ts.linux.dev,
	hernan.poncedeleon@...weicloud.com,
	Jonas Oberhauser <jonas.oberhauser@...weicloud.com>
Subject: [RFC] tools/memory-model: Rule out OOTA

The current LKMM allows out-of-thin-air (OOTA), as evidenced in the following
example shared on this list a few years ago:

  P0(int *a, int *b, int *x, int *y) {
  	int r1;

  	r1 = READ_ONCE(*x);
  	smp_rmb();
  	if (r1 == 1) {
  		*a = *b;
  	}
  	smp_wmb();
  	WRITE_ONCE(*y, 1);
  }

  P1(int *a, int *b, int *x, int *y) {
  	int r1;

  	r1 = READ_ONCE(*y);
  	smp_rmb();
  	if (r1 == 1) {
  		*b = *a;
  	}
  	smp_wmb();
  	WRITE_ONCE(*x, 1);
  }

  exists b=42

The root cause is an interplay between plain accesses and rw-fences, i.e.,
smp_rmb() and smp_wmb(): while smp_rmb() and smp_wmb() provide sufficient
ordering for plain accesses to rule out data races, they do not in the current
formalization generally actually order the plain accesses, allowing, e.g., the
load and stores to *b to proceed in any order even if P1 reads from P0; and in
particular, the marked accesses around those plain accesses are also not
ordered, which causes this OOTA.

In this patch, we choose the rather conservative approach of forcing only the
order of these marked accesses, specifically, when a marked read r is
separated from a plain read r' by an smp_rmb() (or r' has an address
dependency on r or is r'), on which a write w' depends, and w' is either plain
and seperated by a subsequent write w by an smp_wmb() (or w' is w), then r
precedes w in ppo.

Furthermore, we do not force any order in cases where r or w could be elided
due to a store with the same address either being before the read (which would
allow r to be substituted for the value of the store) or after the write
(which would allow w to be dropped).

Even though this patch is conservative in this sense, it ensures general
OOTA-freedom, more specifically, that any execution with no data race will not
have any cycles of

      (ctrl | addr | data) ; rfe

This definition of OOTA is much weaker than more standard definitions, such as
requiring that there are no cycles of

      ctrl | addr | data | rf

These definitions work well for syntactic dependencies (hardware models) but not
semantic dependencies (language models, like LKMM).

We first discuss why the more standard definition does not work well for
language models like LKMM. For example, consider

    r1 = *a;
    *b = 1;
    if (*a == 1)
      *b = 1;
    *c = *b;

In the execution where r1 == 1, there is a control dependency from
the load of *a to the second store to *b, from which the load to *b reads,
and the store to *c has a data dependency on this load from *b. Nevertheless
there is no semantic dependency from the load of *a to the store to *c; the
compiler could easily replace the last line with *c = 1 and move this line to
the top as follows:

    *c = 1;
    r1 = *a;
    *b = 1;

Since there is no order imposed by this sequence of syntactic dependencies
and reads, syntactic dependencies can not by themselves form an acyclic
relation.

In turn, there are some sequences of syntactic dependencies and reads that do
form semantic dependencies, such as

    r1 = *a;
    *b = 2;
    if (*a == 1)
      *b = 1;
    *c = *b;

Here we would consider that the store to *c has a semantic data dependency on
the read from *a, given that depending on the result of that read, we store
either the value 1 or 2 to *c.

Unfortunately, herd7 is currently limited to syntactic dependencies and can
not distinguish these two programs. As a result, while our patch is intended
to provide ordering for cases resembling the second program (but not the
first), with the dependencies considered by the current version of herd7, we
do not get such an ordering.

There are two more caveats of this patch. The first is that just because
there are no subsequent writes to the location of a write w (until the next
compiler barrier), does not imply that the write w can not be elided, e.g.,
if the location of w does not live until the next barrier (as pointed out
by Alan Stern a while back).

Unfortunately we can not currently express this in herd7's syntax.

In fact, to avoid OOTA, it would be sufficient to provide order only in
cases where w is read-from by another thread. But that is a rather unnatural
formulation.

The last caveat is that while we have done a formal proof that this
patch excludes OOTA (in all data race free executions), we did so with a
different formalization of compiler barrier (formalized in this patch as
w_barrier). I suspect that it may be possible to almost completely switch
over from w_barrier to the normal definition of barrier, +- the fact that
a marked write together with po-loc is also a compiler barrier.

But I currently do not have time to investigate this deeply, and I thought
maybe there are already some comments on the main parts of the patch. The
epsilons and deltas should be resolvable.

Have fun,
  jonas

Signed-off-by: Jonas Oberhauser <jonas.oberhauser@...weicloud.com>
---
 tools/memory-model/linux-kernel.cat | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/tools/memory-model/linux-kernel.cat b/tools/memory-model/linux-kernel.cat
index d7e7bf13c831..180cab56729e 100644
--- a/tools/memory-model/linux-kernel.cat
+++ b/tools/memory-model/linux-kernel.cat
@@ -71,6 +71,10 @@ let barrier = fencerel(Barrier | Rmb | Wmb | Mb | Sync-rcu | Sync-srcu |
 		Rcu-lock | Rcu-unlock | Srcu-lock | Srcu-unlock) |
 	(po ; [Release]) | ([Acquire] ; po)
 
+let w_barrier = po? ; [F | Marked] ; po?
+let rmb-plain = [R4rmb] ; po ; [Rmb] ; (po \ (po ; [W] ; (po-loc \ w_barrier))) ; [R4rmb & Plain]
+let plain-wmb = [W & Plain] ; (po \ ((po-loc \ w_barrier) ; po ; [W] ; po)) ; [Wmb] ; po ; [W]
+
 (**********************************)
 (* Fundamental coherence ordering *)
 (**********************************)
@@ -90,7 +94,7 @@ empty rmw & (fre ; coe) as atomic
 let dep = addr | data
 let rwdep = (dep | ctrl) ; [W]
 let overwrite = co | fr
-let to-w = rwdep | (overwrite & int) | (addr ; [Plain] ; wmb)
+let to-w = ((addr | rmb-plain)? ; rwdep ; plain-wmb?) | (overwrite & int) | addr ; [Plain] ; wmb
 let to-r = (addr ; [R]) | (dep ; [Marked] ; rfi)
 let ppo = to-r | to-w | (fence & int) | (po-unlock-lock-po & int)
 
-- 
2.34.1