[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0b1a9379-000d-4924-a930-dde5daaf1dd8@huaweicloud.com>
Date: Sun, 12 Jan 2025 16:55:07 +0100
From: Jonas Oberhauser <jonas.oberhauser@...weicloud.com>
To: Alan Stern <stern@...land.harvard.edu>
Cc: paulmck@...nel.org, parri.andrea@...il.com, will@...nel.org,
peterz@...radead.org, boqun.feng@...il.com, npiggin@...il.com,
dhowells@...hat.com, j.alglave@....ac.uk, luc.maranget@...ia.fr,
akiyks@...il.com, dlustig@...dia.com, joel@...lfernandes.org,
urezki@...il.com, quic_neeraju@...cinc.com, frederic@...nel.org,
linux-kernel@...r.kernel.org, lkmm@...ts.linux.dev,
hernan.poncedeleon@...weicloud.com
Subject: Re: [RFC] tools/memory-model: Rule out OOTA
Am 1/11/2025 um 10:19 PM schrieb Alan Stern:
> On Sat, Jan 11, 2025 at 01:46:21PM +0100, Jonas Oberhauser wrote:
>>
>> This is under my assumption that if we had let's say gcc's "semantic
>> dependencies" or an under-approximation of it (by that I mean allow less
>> things to be dependent than gcc can see), that these cases would be
>> resolved, in the sense that gcc can not undermine [R & Marked] ; gcc-dep
>> where gcc-dep is the dependencies detected by gcc.
>
> That seems circular. Basically, you're saying the gcc will not break
> any dependencies that gcc classifies as not-breakable!
Maybe my formulation is not exactly what I meant to express.
I am thinking of examples like this,
r1 = READ_ONCE(*x);
*b = r1;
~~>
r1 = READ_ONCE(*x);
if (*b != r1) {
*b = r1;
}
Here there is clearly a dependency to a store, but gcc might turn it
into an independent load (in case *b == r1).
Just because gcc admits that there is a dependency, does not necessarily
mean that it will not still undermine the ordering "bestowed upon" that
dependency by a memory model in some creative way.
The cases that I could think of all still worked for very specific
architecture-specific reasons (e.g., x86 has CMOV but all loads provide
acquire-ordering, and arm does not have flag-conditional str, etc.)
Or perhaps there is no dependency in case *b == r1. I am not sure.
Another thought that pops up here is that when I last worked on
formalizing dependencies, I could not define dependencies as being
between one load and one store, a dependency might be between a set of
loads and one store. I would have to look up the exact reason, but I
think it was because sometimes you need to change more than one value to
influence the result, e.g., a && b where both a and b are 0 - just
changing one will not make a difference.
All of these complications make me wonder whether even a relational
notion of semantic dependency is good enough.
>>>> Alan Stern:
>>>>> Perhaps so. LKMM does include other features which the compiler can
>>>>> defeat if the programmer isn't sufficiently careful.
>>>>
>>>> How many of these are due to herd7's limitations vs. in the cat file?
>>>
>>> Important limitations are present in both.
>>
>> I am genuinely asking. Do we have a list of the limitations?
>> Maybe it would be good to collect it in the "A WARNING" section of
>> explanation.txt if it doesn't exist elsewhere.
>
> There are a few listed already at various spots in explanation.txt --
> search for "undermine". And yes, many or most of these limitations do
> arise from LKMM's failure to recognize when a dependency isn't semantic.
> Maybe some are also related to undefined behavior, which LKMM is not
> aware of.
Thanks. Another point mentioned in that document is the total order of
po, whereas C has a more relaxed notion of sequenced-before; this could
even affect volatile accesses, e.g. in f(READ_ONCE(*a), g()). where g()
calls an rmb and then READ_ONCE(*b), and it is not clear whether there
should be a ppo from reading *a to *b in some executions or not.
>
> There is one other weakness I know of, however -- something totally
> different. It's an instance in which the formal model in the .cat file
> fails to capture the intent of the informal operational model.
>
> As I recall, it goes like this: The operational model asserts that an
> A-cumulative fence (like a release fence) on a CPU ensures ordering for
> all stores that propagate to that CPU before the fence is executed. But
> the .cat code says only that this ordering applies to stores which the
> CPU reads from before the fence is executed. I believe you can make up
> litmus tests where you can prove that a store must have propagated to
> the CPU before an A-cumulative fence occurs, even though the CPU doesn't
> read from that store; for such examples the LKMM may accept executions
> that shouldn't be allowed.
>
> There may even be an instance of this somewhere in the various litmus
> test archives; I don't remember.
Ah, I think you explained this case to me before. It is the one where
prop & int covers some but not all cases, right? (compared to the cat
PowerPC model, which does not even have prop & int)
For now I am mostly worried about cases where LKMM promises too much,
rather than too little. The latter case arises naturally as a trade-off
between complexity of the model, algorithmic complexity, and what
guarantees are actually needed from the model by real code.
By the way, I am currently investigating a formulation of LKMM where
there is a separate "propagation order" per thread, prop-order[t], which
relates `a` to `b` events iff `a` is "observed" to propagate to t before
`b`. Observation can also include co and fr, not just rf, which might be
sufficient to cover those cases. I have a hand-written proof sketch that
an order ORD induced by prop-order[t], xb, and some other orders is
acyclic, and a Coq proof that executing an operational model in any
linearization of ORD is permitted (e.g., does not propagate a store
before it is executed, or in violation of co) and has the same rf as the
axiomatic execution.
So if the proof sketch works out, this might indicate that with such a
per-thread propagation order, one can eliminate those cases.
But to make the definition work, I had to make xb use prop-order[t]
instead of prop in some cases, and the definitions of xb and
prop-order[t] are mutually recursive, so it's not a very digestible
definition.
So I do not recommend replacing LKMM with such a model even if it works,
but it is useful for investigating some boundary conditions.
>>> An execution of the corresponding machine code would then be compatible
>>> with an abstract execution of the source code in which both r0 and r1
>>> get set to 42 (OOTA). But it would also be compatible with an abstract
>>> execution in which both r0 and r1 are 0, so it doesn't make sense to say
>>> that the hardware execution is, or might be, an instance of OOTA.
>>
>> Yes, but this does not require the definition you expressed before. This is
>> already not observable according to the definition that there is no read R
>> from an OOTA-cycle store, where some observable side effect semantically
>> depends on R.
>>
>> What I was trying to say is that your definition is almost a
>> no-true-scotsman (sorry Paul) definition:
>
> (I would use the term "tautology" -- although I don't believe it was a
> tautology in its original context, which referred specifically to cases
> where the compiler had removed all the accesses in an OOTA cycle.)
>
>> every program with an OOTA
>> execution where you could potentially not find an "equivalent" execution
>> without OOTA, is simply labeled a "no-true-unobserved OOTA".
>>
>>>> But it does make the proof of your claim totally trivial. If there is no
>>>> other OOTA-free execution with the same observable behavior, then it is
>>>> proof that the OOTA happened, so the OOTA was observed.
>>>> So by contraposition any non-observed OOTA has an OOTA-free execution with
>>>> the same observable behavior.
>>>>
>>>> The sense in which I would define observed is more along the lines of "there
>>>> is an observable side effect (such as store to volatile location) which has
>>>> a semantic dependency on a load that reads from one of the stores in the
>>>> OOTA cycle".
>
> Agreed. But maybe this indicates that your sense is too weak a
> criterion, that an indirect observation should count just as much as a
> direct one.
I don't follow this conclusion.
I think there are two relevant claims here:
1) compilers do not introduce "observed OOTA"
2) For every execution graph with not-observed OOTA, there is another
execution with the same observable side effects that does not have OOTA.
While it may be much easier to prove 2) with a more relaxed notion of
observed OOTA, 1) sounds much harder. How does the compiler know that
there is no indirect way to observe the OOTA?
E.g., in the LKMM example, ignoring the compiler barriers, it might be
possible for a compiler to deduce that the plain accesses are never used
and delete them, resulting in an OOTA that is observed under the more
relaxed setting, violating claim 1).
Best wishes,
jonas
Powered by blists - more mailing lists