[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <63a462dd-f182-4868-92b8-f33f2dc266f8@huaweicloud.com>
Date: Thu, 9 Jan 2025 17:44:54 +0100
From: Jonas Oberhauser <jonas.oberhauser@...weicloud.com>
To: Alan Stern <stern@...land.harvard.edu>
Cc: paulmck@...nel.org, parri.andrea@...il.com, will@...nel.org,
peterz@...radead.org, boqun.feng@...il.com, npiggin@...il.com,
dhowells@...hat.com, j.alglave@....ac.uk, luc.maranget@...ia.fr,
akiyks@...il.com, dlustig@...dia.com, joel@...lfernandes.org,
urezki@...il.com, quic_neeraju@...cinc.com, frederic@...nel.org,
linux-kernel@...r.kernel.org, lkmm@...ts.linux.dev,
hernan.poncedeleon@...weicloud.com
Subject: Re: [RFC] tools/memory-model: Rule out OOTA
Am 1/9/2025 um 5:17 PM schrieb Alan Stern:
> On Wed, Jan 08, 2025 at 08:22:07PM +0100, Jonas Oberhauser wrote:
>>
>>
>> Am 1/8/2025 um 7:47 PM schrieb Alan Stern:
>>> On Wed, Jan 08, 2025 at 06:33:05PM +0100, Jonas Oberhauser wrote:
>>>>
>>>>
>>>> Am 1/7/2025 um 5:09 PM schrieb Alan Stern:
>>>>> Is this really valid? In the example above, if there were no other
>>>>> references to a or b in the rest of the program, the compiler could
>>>>> eliminate them entirely.
>>>>
>>>> In the example above, this is not possible, because the address of a/b have
>>>> escaped the function and are not deallocated before synchronization happens.
>>>> Therefore the compiler must assume that a/b are accessed inside the compiler
>>>> barrier.
>>>
>>> I'm not quite sure what you mean by that, but if the compiler has access
>>> to the entire program and can do a global analysis then it can recognize
>>> cases where accesses that may seem to be live aren't really.
>>
>> Even for trivial enough cases where the compiler has access to all the
>> source, compiler barriers should be opaque to the compiler.
>>
>> Since it is opaque,
>>
>> *a = 1;
>> compiler_barrier();
>>
>> might as well be
>>
>> *a = 1;
>> *d = *a; // *d is in device memory
>>
>> and so in my opinion the compiler needs to ensure that the value of *a right
>> before the compiler barrier is 1.
>>
>> Of course, only if the address of *a could be possibly legally known to the
>> opaque code in the compiler barrier.
>
> What do you mean by "opaque code in the compiler barrier"? The
> compiler_barrier() instruction doesn't generate any code at all; it
> merely directs the compiler not to carry any knowledge about values
> stored in memory from one side of the barrier to the other.
What I mean by "opaque" is that the compiler does not analyze the code
inside the compiler barrier, so it must treat it as a black box that
could manipulate memory arbitrarily within the limitation that it can
not guess the address of memory.
So for example, in
int a = 1;
barrier();
a = 2;
//...
the compiler does not know how the code inside barrier() accesses
memory, including volatile memory.
But it knows that it can not access `a`, because the address of `a` has
never escaped before the barrier().
So it can change this to:
barrier();
int a = 2;
// ...
But if we let the address of `a` escape, for example with some external
function foo(int*):
int a;
foo(&a);
a = 1;
barrier();
a = 2;
// ...
Then the compiler has to assume that the code of foo and barrier might
be something like this:
foo(p) { SPECIAL_VARIABLE = p; }
barrier() { TURN_THE_BREAKS_ON = *SPECIAL_VARIABLE; }
and it must make sure that the value of `a` before barrier() is 1.
That is at least my understanding.
In fact, even if `a` is unused after a=2, the compiler can only
eliminate `a` in the former case, but in the latter case, still needs to
ensure that the value of `a` before barrier() is 1 (but it can eliminate
a=2).
>
> Note that it does _not_ necessarily prevent the compiler from carrying
> knowledge that a memory location is unused from one side of the barrier
> to the other.
Yes, or even merging/moving assignments to the memory location across a
barrier(), as in the example above.
>>> However, I admit this objection doesn't really apply to Linux kernel
>>> programming.
>>>
>>>>> (Whether the result could count as OOTA is
>>>>> open to question, but that's not the point.) Is it not possible that a
>>>>> compiler might find other ways to defeat your intentions?
>>>>
>>>> The main point (which I already mentioned in the previous e-mail) is if the
>>>> object is deallocated without synchronization (or never escapes the function
>>>> in the first place).
>>>>
>>>> And indeed, any such case renders the added rule unsound. It is in a sense
>>>> unrelated to OOTA; cases where the load/store can be elided are never OOTA.
>>>
>>> That is a matter of definition. In our paper, Paul and I described
>>> instances of OOTA in which all the accesses have been optimized away as
>>> "trivial".
>>
>> Yes, by OOTA I mean a rwdep;rfe cycle.
>>
>> In the absence of data races, such a cycle can't be optimized away because
>> it is created with volatile/compiler-barrier-protected accesses.
>
> That wasn't true in the C++ context of the paper Paul and I worked on.
> Of course, C++ is not our current context here.
Yes, you are completely correct. In C++ (or pure C), where data races
are prevented by compiler/language-builtins rather than with
compiler-barriers/volatiles, all my assumptions break.
That is because the compiler absolutely knows that an atomic_store(&x)
does not access any memory location other than x, so it can do a lot
more "harmful" optimizations.
That's why I said such a language model should just exclude global OOTA
by fiat.
I have to read your paper again (I think I read it a few months ago) to
understand if the trivial OOTA would make even that vague axiom unsound
(my intuition says that if the OOTA is never observed by influencing the
side-effect, then forbidding OOTA makes no difference to the set of
"observable behaviors" of a C++ program even there is a trivial OOTA,
and if the OOTA has visible side-effects, then it is acceptable for the
compiler not to do the "optimization" that turns it into a trivial OOTA
and choose some other optimization instead, so we can as well forbid the
compiler from doing it).
> What I was trying to get at above is that compiler-barrier protection
> does not necessarily guarantee that non-volatile accesses can't be
> optimized away. (However, it's probably safe for us to make such an
> assumption here.)
Yes, I agree.
>> It seems the ideal solution is to let Live be defined by the tools, which
>> should keep up with or exceed the analysis done by state-of-art compilers.
>
> I don't think it works that way in practice. :-)
Yeah... maybe not :(
Best wishes,
jonas
Powered by blists - more mailing lists