linux-kernel - Re: [RFC] tools/memory-model: Rule out OOTA

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0b1a9379-000d-4924-a930-dde5daaf1dd8@huaweicloud.com>
Date: Sun, 12 Jan 2025 16:55:07 +0100
From: Jonas Oberhauser <jonas.oberhauser@...weicloud.com>
To: Alan Stern <stern@...land.harvard.edu>
Cc: paulmck@...nel.org, parri.andrea@...il.com, will@...nel.org,
 peterz@...radead.org, boqun.feng@...il.com, npiggin@...il.com,
 dhowells@...hat.com, j.alglave@....ac.uk, luc.maranget@...ia.fr,
 akiyks@...il.com, dlustig@...dia.com, joel@...lfernandes.org,
 urezki@...il.com, quic_neeraju@...cinc.com, frederic@...nel.org,
 linux-kernel@...r.kernel.org, lkmm@...ts.linux.dev,
 hernan.poncedeleon@...weicloud.com
Subject: Re: [RFC] tools/memory-model: Rule out OOTA



Am 1/11/2025 um 10:19 PM schrieb Alan Stern:
> On Sat, Jan 11, 2025 at 01:46:21PM +0100, Jonas Oberhauser wrote:
 >>
>> This is under my assumption that if we had let's say gcc's "semantic
>> dependencies" or an under-approximation of it (by that I mean allow less
>> things to be dependent than gcc can see), that these cases would be
>> resolved, in the sense that gcc can not undermine [R & Marked] ; gcc-dep
>> where gcc-dep is the dependencies detected by gcc.
> 
> That seems circular.  Basically, you're saying the gcc will not break
> any dependencies that gcc classifies as not-breakable!

Maybe my formulation is not exactly what I meant to express.
I am thinking of examples like this,

  r1 = READ_ONCE(*x);
  *b = r1;

~~>

  r1 = READ_ONCE(*x);
  if (*b != r1) {
    *b = r1;
  }

Here there is clearly a dependency to a store, but gcc might turn it 
into an independent load (in case *b == r1).

Just because gcc admits that there is a dependency, does not necessarily 
mean that it will not still undermine the ordering "bestowed upon" that 
dependency by a memory model in some creative way.

The cases that I could think of all still worked for very specific 
architecture-specific reasons (e.g., x86 has CMOV but all loads provide 
acquire-ordering, and arm does not have flag-conditional str, etc.)

Or perhaps there is no dependency in case *b == r1. I am not sure.

Another thought that pops up here is that when I last worked on 
formalizing dependencies, I could not define dependencies as being 
between one load and one store, a dependency might be between a set of 
loads and one store. I would have to look up the exact reason, but I 
think it was because sometimes you need to change more than one value to 
influence the result, e.g., a && b where both a and b are 0 - just 
changing one will not make a difference.

All of these complications make me wonder whether even a relational 
notion of semantic dependency is good enough.

>>>> Alan Stern:
>>>>> Perhaps so.  LKMM does include other features which the compiler can
>>>>> defeat if the programmer isn't sufficiently careful.
>>>>
>>>> How many of these are due to herd7's limitations vs. in the cat file?
>>>
>>> Important limitations are present in both.
>>
>> I am genuinely asking. Do we have a list of the limitations?
>> Maybe it would be good to collect it in the "A WARNING" section of
>> explanation.txt if it doesn't exist elsewhere.
> 
> There are a few listed already at various spots in explanation.txt --
> search for "undermine".  And yes, many or most of these limitations do
> arise from LKMM's failure to recognize when a dependency isn't semantic.
> Maybe some are also related to undefined behavior, which LKMM is not
> aware of.

Thanks. Another point mentioned in that document is the total order of 
po, whereas C has a more relaxed notion of sequenced-before; this could 
even affect volatile accesses, e.g. in f(READ_ONCE(*a), g()). where g() 
calls an rmb and then READ_ONCE(*b), and it is not clear whether there 
should be a ppo from reading *a to *b in some executions or not.

> 
> There is one other weakness I know of, however -- something totally
> different.  It's an instance in which the formal model in the .cat file
> fails to capture the intent of the informal operational model.
> 
> As I recall, it goes like this: The operational model asserts that an
> A-cumulative fence (like a release fence) on a CPU ensures ordering for
> all stores that propagate to that CPU before the fence is executed.  But
> the .cat code says only that this ordering applies to stores which the
> CPU reads from before the fence is executed.  I believe you can make up
> litmus tests where you can prove that a store must have propagated to
> the CPU before an A-cumulative fence occurs, even though the CPU doesn't
> read from that store; for such examples the LKMM may accept executions
> that shouldn't be allowed.
> 
> There may even be an instance of this somewhere in the various litmus
> test archives; I don't remember.


Ah, I think you explained this case to me before. It is the one where 
prop & int covers some but not all cases, right? (compared to the cat 
PowerPC model, which does not even have prop & int)

For now I am mostly worried about cases where LKMM promises too much, 
rather than too little. The latter case arises naturally as a trade-off 
between complexity of the model, algorithmic complexity, and what 
guarantees are actually needed from the model by real code.

By the way, I am currently investigating a formulation of LKMM where 
there is a separate "propagation order" per thread, prop-order[t], which 
relates `a` to `b` events iff `a` is "observed" to propagate to t before 
`b`. Observation can also include co and fr, not just rf, which might be 
sufficient to cover those cases. I have a hand-written proof sketch that 
an order ORD induced by prop-order[t], xb, and some other orders is 
acyclic, and a Coq proof that executing an operational model in any 
linearization of ORD is permitted (e.g., does not propagate a store 
before it is executed, or in violation of co) and has the same rf as the 
axiomatic execution.

So if the proof sketch works out, this might indicate that with such a 
per-thread propagation order, one can eliminate those cases.

But to make the definition work, I had to make xb use prop-order[t] 
instead of prop in some cases, and the definitions of xb and 
prop-order[t] are mutually recursive, so it's not a very digestible 
definition.

So I do not recommend replacing LKMM with such a model even if it works, 
but it is useful for investigating some boundary conditions.

>>> An execution of the corresponding machine code would then be compatible
>>> with an abstract execution of the source code in which both r0 and r1
>>> get set to 42 (OOTA).  But it would also be compatible with an abstract
>>> execution in which both r0 and r1 are 0, so it doesn't make sense to say
>>> that the hardware execution is, or might be, an instance of OOTA.
>>
>> Yes, but this does not require the definition you expressed before. This is
>> already not observable according to the definition that there is no read R
>> from an OOTA-cycle store, where some observable side effect semantically
>> depends on R.
 >>
>> What I was trying to say is that your definition is almost a
>> no-true-scotsman (sorry Paul) definition:
> 
> (I would use the term "tautology" -- although I don't believe it was a
> tautology in its original context, which referred specifically to cases
> where the compiler had removed all the accesses in an OOTA cycle.)
> 
>>   every program with an OOTA
>> execution where you could potentially not find an "equivalent" execution
>> without OOTA, is simply labeled a "no-true-unobserved OOTA".
>>
>>>> But it does make the proof of your claim totally trivial. If there is no
>>>> other OOTA-free execution with the same observable behavior, then it is
>>>> proof that the OOTA happened, so the OOTA was observed.
>>>> So by contraposition any non-observed OOTA has an OOTA-free execution with
>>>> the same observable behavior.
>>>>
>>>> The sense in which I would define observed is more along the lines of "there
>>>> is an observable side effect (such as store to volatile location) which has
>>>> a semantic dependency on a load that reads from one of the stores in the
>>>> OOTA cycle".
> 
> Agreed.  But maybe this indicates that your sense is too weak a
> criterion, that an indirect observation should count just as much as a
> direct one.
I don't follow this conclusion.
I think there are two relevant claims here:
1) compilers do not introduce "observed OOTA"
2) For every execution graph with not-observed OOTA, there is another 
execution with the same observable side effects that does not have OOTA.

While it may be much easier to prove 2) with a more relaxed notion of 
observed OOTA, 1) sounds much harder. How does the compiler know that 
there is no indirect way to observe the OOTA?

E.g., in the LKMM example, ignoring the compiler barriers, it might be 
possible for a compiler to deduce that the plain accesses are never used 
and delete them, resulting in an OOTA that is observed under the more 
relaxed setting, violating claim 1).

Best wishes,
   jonas