[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <838f7a51-c0f5-4a19-8407-6591426fd4a8@huaweicloud.com>
Date: Sat, 11 Jan 2025 13:46:21 +0100
From: Jonas Oberhauser <jonas.oberhauser@...weicloud.com>
To: Alan Stern <stern@...land.harvard.edu>
Cc: paulmck@...nel.org, parri.andrea@...il.com, will@...nel.org,
peterz@...radead.org, boqun.feng@...il.com, npiggin@...il.com,
dhowells@...hat.com, j.alglave@....ac.uk, luc.maranget@...ia.fr,
akiyks@...il.com, dlustig@...dia.com, joel@...lfernandes.org,
urezki@...il.com, quic_neeraju@...cinc.com, frederic@...nel.org,
linux-kernel@...r.kernel.org, lkmm@...ts.linux.dev,
hernan.poncedeleon@...weicloud.com
Subject: Re: [RFC] tools/memory-model: Rule out OOTA
Am 1/10/2025 um 10:51 PM schrieb Alan Stern:
> On Fri, Jan 10, 2025 at 01:21:32PM +0100, Jonas Oberhauser wrote:
>>
>>
>> Am 1/10/2025 um 4:12 AM schrieb Alan Stern:
>>> On Thu, Jan 09, 2025 at 09:09:00PM +0100, Jonas Oberhauser wrote:
>>>
>>>>>> So for example, in
>>>>>>
>>>>>> int a = 1;
>>>>>> barrier();
>>>>>> a = 2;
>>>>>> //...
>>>>>>
>>>>>> the compiler does not know how the code inside barrier() accesses memory,
>>>>>> including volatile memory.
>>>>>
>>>>> I would say rather that the compiler does not know that the values
>>>>> stored in memory are the same before and after the barrier().
>>>>>
>>>>> Even the
>>>>> values of local variables whose addresses have not been exported.
>>>>
>>>> No, this is not true. I used to think so too until a short while ago.
>>>>
>>>> But if you look at the output of gcc -o3 you will see that it does happily
>>>> remove `a` in this function.
>>>
>>> Isn't that consistent with what I said?
>>
>>
>> Ok, after careful reading, I think there are two assumptions you have that I
>> think are not true, but my example is only inconsistent with exactly one of
>> them being not true, not with both of them being not true:
>>
>> [...]
>>
>> 2) the barrier also talks about the values of local variables whose
>> addresses have not been exported. I do not believe this is the case.
>
> 2) isn't so clear. If a local variable's address is never computed then
> the compiler might put the variable in a register, in which case the
> barrier would not clobber it. On the other hand, if the variable's
> address is computed somewhere (even if it isn't exported) then the
> variable can't be kept in a register and so it is subject to the
> barrier's effects.
I understood "its address is exported" to mean that enough information
has been exported to legally compute its address.
Btw, I just remembered a discussion about provenance in C & C++ which is
also very related to this, where the compiler moved a (I think
non-atomic) access across a release fence because it "knew" that the
address of the non-atomic was not exported.
I can't find that discussion now.
>> For what gcc guarantees, the manual says: "add memory to the list of
>> clobbered registers. This will cause GCC to not keep memory values cached in
>> registers across the assembler instruction", i.e., it needs to flush the
>> value from the register to actual memory.
>
> Yes, GCC will write back dirty values from registers. But not because
> the cached values will become invalid (in fact, the cached values might
> not even be used after the barrier). Rather, because the compiler is
> required to assume that the assembler code in the barrier might access
> arbitrary memory locations -- even if that code is empty.
Yes :)
>>>> If herd7 would generate edges for semantic dependencies instead of for its
>>>> version of syntactic dependencies, then rwdep is the answer.
>>>
>>> That statement is meaningless (or at least, impossible to implement)
>>> because there is no widely agreed-upon formal definition for semantic
>>> dependency.
>>
>> Yes, which also means that a 100% correct end-to-end solution (herd + cat +
>> ... ?) is currently not implementable.
>>
>> But we can still break the problem into two halves, one which is 100% solved
>> inside the cat file, and one which is the responsibility of herd7 and
>> currently not solved (or 100% satisfactorily solvable).
>
> I believe that Luc and the other people involved with herd7 take the
> opposite point of view: herd7 is intended to do the "easy" analysis
> involving only straightforward code parsing, leaving the "hard"
> conceptual parts to the user-supplied .cat file.
I can understand the attractiveness of that point of view, but there is
no way to define "semantic dependencies" at all or "live access" 100%
accurately in cat, since it requires a lot of syntactic information that
is not present at that level.
But there is in herd7 (at least for some specific definition of
"semantically dependent").
>> The advantage being that if we read the cat file as a mathematical
>> definition, we can at least on paper argue 100% correctly about code for the
>> cases where we either can figure out on paper what the semantic dependencies
>> are, or where we at least just say "with relation to current compilers, we
>> know what the semantically preserved dependencies are", even if herd7 or
>> other tools lags behind in one or both.
>>
>> After all, herd7 is just a (useful) automation tool for reasoning about
>> LKMM, which has its limitations (scalability, a specific definition of
>> dependencies, limited C subset...).
>
> I still think we should not attempt any sort of formalization of
> semantic dependency here.
I 100% agree and apologies if I ever gave that impression.
What I want is to change the interpretation of ctrl,data,addr in LKMM
from saying "it is intended to be a syntactic dependency, which causes
LKMM to be inaccurate" to "it is intended to be a semantic dependency,
but because there is no formal defn. and tooling we *use* syntactic
dependencies, which causes the current implementations to be
inaccurate", without formally defining what a semantic dependency is.
E.g., in "A WARNING", I would change
-------------
The protections provided by READ_ONCE(), WRITE_ONCE(), and others are
not perfect; and under some circumstances it is possible for the
compiler to undermine the memory model.
-------------
into something like
-------------
The current tooling around LKMM does not model semantic dependencies,
and instead uses syntactic dependencies to specify the protections
provided by READ_ONCE(), WRITE_ONCE(), and others. The compiler can
undermine these syntactic dependencies under some circumstances.
As a consequence, the tooling may write checks that LKMM and the
compiler can not cash.
-------------
etc.
This is under my assumption that if we had let's say gcc's "semantic
dependencies" or an under-approximation of it (by that I mean allow less
things to be dependent than gcc can see), that these cases would be
resolved, in the sense that gcc can not undermine [R & Marked] ; gcc-dep
where gcc-dep is the dependencies detected by gcc.
But I would be interested to see any cases where this assumption is not
true.
Note that this still results in a fully formal definition of LKMM,
because just like now, addr,ctrl,and data are simply uninterpreted
relations. We don't need to formalize their meaning at that level.
>>>> Given that we can not define dep inside the cat model, one may as well
>>>> define it as rwdep;rfe with the intended meaning of the dependencies being
>>>> the semantic ones; then it is an inaccuracy of herd7 that it does not
>>>> provide the proper dependencies.
>>>
>>> Perhaps so. LKMM does include other features which the compiler can
>>> defeat if the programmer isn't sufficiently careful.
>>
>> How many of these are due to herd7's limitations vs. in the cat file?
>
> Important limitations are present in both.
I am genuinely asking. Do we have a list of the limitations?
Maybe it would be good to collect it in the "A WARNING" section of
explanation.txt if it doesn't exist elsewhere.
>>>>> In the paper, I speculated that if a physical execution of a program
>>>>> matches an abstract execution containing such a non-observed OOTA cycle,
>>>>> then it also matches another abstract execution in which the cycle
>>>>> doesn't exist. I don't know how to prove this conjecture, though.
>>>>
>>>> Yes, that also makes sense.
>>>>
>>>> Note that this speculation does not hold in the current LKMM though. In the
>>>> Litmus test I shared in the opening e-mail, where the outcome 0:r1=1 /\
>>>> 1:r1=1 is only possible with an OOTA (even though the values from the OOTA
>>>> are never used anywhere).
>>>
>>> If the fact that the outcome 0:r1=1 /\ 1:r1=1 has occurred is proof that
>>> there was OOTA, then the OOTA cycle _is_ observed, albeit indirectly --
>>> at least, in the sense that I intended. (The situation mentioned in the
>>> paper is better described as an execution where the compiler has elided
>>> all the accesses in the OOTA cycle.)
>>
>> I'm not sure that sense makes a lot of sense to me.
>
> Here's an example illustrating what I had in mind. Imagine that all the
> accesses here are C++-style relaxed atomic (i.e., not volatile and also
> not subject to data races):
>
> P0(int *a, int *b) {
> int r0 = *a;
> *b = r0;
> *b = 2;
> // r0 not used again
> }
>
> P1(int *a, int *b) {
> int r1 = *b;
> *a = r1;
> *a = 2;
> // r1 not used again
> }
>
> The compiler could eliminate the r0 and r1 accesses entirely, leaving
> just:
>
> P0(int *a, int *b) {
> *b = 2;
> }
>
> P1(int *a, int *b) {
> *a = 2;
> }
>
> An execution of the corresponding machine code would then be compatible
> with an abstract execution of the source code in which both r0 and r1
> get set to 42 (OOTA). But it would also be compatible with an abstract
> execution in which both r0 and r1 are 0, so it doesn't make sense to say
> that the hardware execution is, or might be, an instance of OOTA.
Yes, but this does not require the definition you expressed before. This
is already not observable according to the definition that there is no
read R from an OOTA-cycle store, where some observable side effect
semantically depends on R.
What I was trying to say is that your definition is almost a
no-true-scotsman (sorry Paul) definition: every program with an OOTA
execution where you could potentially not find an "equivalent" execution
without OOTA, is simply labeled a "no-true-unobserved OOTA".
>> But it does make the proof of your claim totally trivial. If there is no
>> other OOTA-free execution with the same observable behavior, then it is
>> proof that the OOTA happened, so the OOTA was observed.
>> So by contraposition any non-observed OOTA has an OOTA-free execution with
>> the same observable behavior.
>>
>> The sense in which I would define observed is more along the lines of "there
>> is an observable side effect (such as store to volatile location) which has
>> a semantic dependency on a load that reads from one of the stores in the
>> OOTA cycle".
>
> Yes, I can see that from your proposed definition of Live.
> I'm afraid we've wandered off the point of this email thread, however...
Maybe, but it is still an important and interesting discussion.
I am also open to continuing it on another channel though.
> Getting back to the original point, why don't you rewrite your patch as
> discussed earlier and describe it as an attempt to add ordering for
> important situations involving plain accesses that the LKMM currently
> does not handle? In other words, leave out as far as possible any
> mention of OOTA or semantic dependency.
I will try, but I have some other things to do first, so it may take a
while.
Have fun,
jonas
Powered by blists - more mailing lists