linux-kernel - Re: [RFC] tools/memory-model: Rule out OOTA

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <63a462dd-f182-4868-92b8-f33f2dc266f8@huaweicloud.com>
Date: Thu, 9 Jan 2025 17:44:54 +0100
From: Jonas Oberhauser <jonas.oberhauser@...weicloud.com>
To: Alan Stern <stern@...land.harvard.edu>
Cc: paulmck@...nel.org, parri.andrea@...il.com, will@...nel.org,
 peterz@...radead.org, boqun.feng@...il.com, npiggin@...il.com,
 dhowells@...hat.com, j.alglave@....ac.uk, luc.maranget@...ia.fr,
 akiyks@...il.com, dlustig@...dia.com, joel@...lfernandes.org,
 urezki@...il.com, quic_neeraju@...cinc.com, frederic@...nel.org,
 linux-kernel@...r.kernel.org, lkmm@...ts.linux.dev,
 hernan.poncedeleon@...weicloud.com
Subject: Re: [RFC] tools/memory-model: Rule out OOTA



Am 1/9/2025 um 5:17 PM schrieb Alan Stern:
> On Wed, Jan 08, 2025 at 08:22:07PM +0100, Jonas Oberhauser wrote:
>>
>>
>> Am 1/8/2025 um 7:47 PM schrieb Alan Stern:
>>> On Wed, Jan 08, 2025 at 06:33:05PM +0100, Jonas Oberhauser wrote:
>>>>
>>>>
>>>> Am 1/7/2025 um 5:09 PM schrieb Alan Stern:
>>>>> Is this really valid?  In the example above, if there were no other
>>>>> references to a or b in the rest of the program, the compiler could
>>>>> eliminate them entirely.
>>>>
>>>> In the example above, this is not possible, because the address of a/b have
>>>> escaped the function and are not deallocated before synchronization happens.
>>>> Therefore the compiler must assume that a/b are accessed inside the compiler
>>>> barrier.
>>>
>>> I'm not quite sure what you mean by that, but if the compiler has access
>>> to the entire program and can do a global analysis then it can recognize
>>> cases where accesses that may seem to be live aren't really.
>>
>> Even for trivial enough cases where the compiler has access to all the
>> source, compiler barriers should be opaque to the compiler.
>>
>> Since it is opaque,
>>
>>    *a = 1;
>>    compiler_barrier();
>>
>> might as well be
>>
>>    *a = 1;
>>    *d = *a; // *d is in device memory
>>
>> and so in my opinion the compiler needs to ensure that the value of *a right
>> before the compiler barrier is 1.
>>
>> Of course, only if the address of *a could be possibly legally known to the
>> opaque code in the compiler barrier.
> 
> What do you mean by "opaque code in the compiler barrier"?  The
> compiler_barrier() instruction doesn't generate any code at all; it
> merely directs the compiler not to carry any knowledge about values
> stored in memory from one side of the barrier to the other.

What I mean by "opaque" is that the compiler does not analyze the code 
inside the compiler barrier, so it must treat it as a black box that 
could manipulate memory arbitrarily within the limitation that it can 
not guess the address of memory.

So for example, in

   int a = 1;
   barrier();
   a = 2;
   //...

the compiler does not know how the code inside barrier() accesses 
memory, including volatile memory.
But it knows that it can not access `a`, because the address of `a` has 
never escaped before the barrier().

So it can change this to:

   barrier();
   int a = 2;
   // ...

But if we let the address of `a` escape, for example with some external 
function foo(int*):

   int a;
   foo(&a);
   a = 1;
   barrier();
   a = 2;
   // ...

Then the compiler has to assume that the code of foo and barrier might 
be something like this:

foo(p) { SPECIAL_VARIABLE = p; }
barrier() { TURN_THE_BREAKS_ON = *SPECIAL_VARIABLE; }

and it must make sure that the value of `a` before barrier() is 1.

That is at least my understanding.

In fact, even if `a` is unused after a=2, the compiler can only 
eliminate `a` in the former case, but in the latter case, still needs to 
ensure that the value of `a` before barrier() is 1 (but it can eliminate 
a=2).


> 
> Note that it does _not_ necessarily prevent the compiler from carrying
> knowledge that a memory location is unused from one side of the barrier
> to the other.

Yes, or even merging/moving assignments to the memory location across a 
barrier(), as in the example above.


>>> However, I admit this objection doesn't really apply to Linux kernel
>>> programming.
>>>
>>>>>    (Whether the result could count as OOTA is
>>>>> open to question, but that's not the point.)  Is it not possible that a
>>>>> compiler might find other ways to defeat your intentions?
>>>>
>>>> The main point (which I already mentioned in the previous e-mail) is if the
>>>> object is deallocated without synchronization (or never escapes the function
>>>> in the first place).
>>>>
>>>> And indeed, any such case renders the added rule unsound. It is in a sense
>>>> unrelated to OOTA; cases where the load/store can be elided are never OOTA.
>>>
>>> That is a matter of definition.  In our paper, Paul and I described
>>> instances of OOTA in which all the accesses have been optimized away as
>>> "trivial".
>>
>> Yes, by OOTA I mean a rwdep;rfe cycle.
>>
>> In the absence of data races, such a cycle can't be optimized away because
>> it is created with volatile/compiler-barrier-protected accesses.
> 
> That wasn't true in the C++ context of the paper Paul and I worked on.
> Of course, C++ is not our current context here.

Yes, you are completely correct. In C++ (or pure C), where data races 
are prevented by compiler/language-builtins rather than with 
compiler-barriers/volatiles, all my assumptions break.

That is because the compiler absolutely knows that an atomic_store(&x) 
does not access any memory location other than x, so it can do a lot 
more "harmful" optimizations.

That's why I said such a language model should just exclude global OOTA 
by fiat.

I have to read your paper again (I think I read it a few months ago) to 
understand if the trivial OOTA would make even that vague axiom unsound
(my intuition says that if the OOTA is never observed by influencing the 
side-effect, then forbidding OOTA makes no difference to the set of 
"observable behaviors" of a C++ program even there is a trivial OOTA, 
and if the OOTA has visible side-effects, then it is acceptable for the 
compiler not to do the "optimization" that turns it into a trivial OOTA 
and choose some other optimization instead, so we can as well forbid the 
compiler from doing it).

> What I was trying to get at above is that compiler-barrier protection
> does not necessarily guarantee that non-volatile accesses can't be
> optimized away.  (However, it's probably safe for us to make such an
> assumption here.)

Yes, I agree.

>> It seems the ideal solution is to let Live be defined by the tools, which
>> should keep up with or exceed the analysis done by state-of-art compilers.
> 
> I don't think it works that way in practice.  :-)

Yeah... maybe not :(

Best wishes,
   jonas