linux-kernel - Re: [RFC] tools/memory-model: Rule out OOTA

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d82053fb-2f8c-4196-85e9-8d5a8f25c391@rowland.harvard.edu>
Date: Thu, 9 Jan 2025 14:27:39 -0500
From: Alan Stern <stern@...land.harvard.edu>
To: Jonas Oberhauser <jonas.oberhauser@...weicloud.com>
Cc: paulmck@...nel.org, parri.andrea@...il.com, will@...nel.org,
	peterz@...radead.org, boqun.feng@...il.com, npiggin@...il.com,
	dhowells@...hat.com, j.alglave@....ac.uk, luc.maranget@...ia.fr,
	akiyks@...il.com, dlustig@...dia.com, joel@...lfernandes.org,
	urezki@...il.com, quic_neeraju@...cinc.com, frederic@...nel.org,
	linux-kernel@...r.kernel.org, lkmm@...ts.linux.dev,
	hernan.poncedeleon@...weicloud.com
Subject: Re: [RFC] tools/memory-model: Rule out OOTA

On Thu, Jan 09, 2025 at 05:44:54PM +0100, Jonas Oberhauser wrote:
> 
> 
> Am 1/9/2025 um 5:17 PM schrieb Alan Stern:
> > What do you mean by "opaque code in the compiler barrier"?  The
> > compiler_barrier() instruction doesn't generate any code at all; it
> > merely directs the compiler not to carry any knowledge about values
> > stored in memory from one side of the barrier to the other.
> 
> What I mean by "opaque" is that the compiler does not analyze the code
> inside the compiler barrier, so it must treat it as a black box that could
> manipulate memory arbitrarily within the limitation that it can not guess
> the address of memory.

Okay, I see what you're getting at.  The way you express it is a little 
confusing, because in fact there is NO code inside the compiler barrier 
(although the compiler doesn't know that) -- the barrier() macro expands 
to an empty assembler instruction, along with an annotation telling the 
compiler that this instruction may affect the contents of memory in 
unknown and unpredictable ways.

> So for example, in
> 
>   int a = 1;
>   barrier();
>   a = 2;
>   //...
> 
> the compiler does not know how the code inside barrier() accesses memory,
> including volatile memory.

I would say rather that the compiler does not know that the values 
stored in memory are the same before and after the barrier().  Even the 
values of local variables whose addresses have not been exported.

> But it knows that it can not access `a`, because the address of `a` has
> never escaped before the barrier().

I don't think this is right.  barrier is (or can be) a macro, not a 
function call with its own scope.  As such, it has -- in principle -- 
the ability to export the address of a.

Question: Can the compiler assume that no other threads access a between 
the two stores, on the grounds that this would be a data race?  I'd 
guess that it can't make that assumption, but it would be nice to know 
for sure.

> So it can change this to:
> 
>   barrier();
>   int a = 2;
>   // ...
> 
> But if we let the address of `a` escape, for example with some external
> function foo(int*):
> 
>   int a;
>   foo(&a);
>   a = 1;
>   barrier();
>   a = 2;
>   // ...
> 
> Then the compiler has to assume that the code of foo and barrier might be
> something like this:
> 
> foo(p) { SPECIAL_VARIABLE = p; }
> barrier() { TURN_THE_BREAKS_ON = *SPECIAL_VARIABLE; }

I think you're giving the compiler too much credit.  The one thing the 
compiler is allowed to assume is that the code, as written, does not 
contain a data race or other undefined behavior.

> and it must make sure that the value of `a` before barrier() is 1.
> 
> That is at least my understanding.

This is the sort of question that memory-barriers.txt should answer.  
It's closely related to the question I mentioned above.

> In fact, even if `a` is unused after a=2, the compiler can only eliminate
> `a` in the former case, but in the latter case, still needs to ensure that
> the value of `a` before barrier() is 1 (but it can eliminate a=2).

And what if a were a global shared variable instead of a local one?  The 
compiler is still allowed to do weird optimizations on it, since the 
accesses aren't volatile.  The barrier() merely prevents the compiler 
from using its knowledge that a is supposed to contain 1 before the 
barrier to influence its decisions about how to optimize the code 
following the barrier.

> > That wasn't true in the C++ context of the paper Paul and I worked on.
> > Of course, C++ is not our current context here.
> 
> Yes, you are completely correct. In C++ (or pure C), where data races are
> prevented by compiler/language-builtins rather than with
> compiler-barriers/volatiles, all my assumptions break.
> 
> That is because the compiler absolutely knows that an atomic_store(&x) does
> not access any memory location other than x, so it can do a lot more
> "harmful" optimizations.
> 
> That's why I said such a language model should just exclude global OOTA by
> fiat.

One problem with doing this is that there is no widely agreed-upon 
formal definition of OOTA.  A cycle in (rwdep ; rfe) isn't the answer 
because rwdep does not encapsulate the notion of semantic dependency.

> I have to read your paper again (I think I read it a few months ago) to
> understand if the trivial OOTA would make even that vague axiom unsound
> (my intuition says that if the OOTA is never observed by influencing the
> side-effect, then forbidding OOTA makes no difference to the set of
> "observable behaviors" of a C++ program even there is a trivial OOTA, and if
> the OOTA has visible side-effects, then it is acceptable for the compiler
> not to do the "optimization" that turns it into a trivial OOTA and choose
> some other optimization instead, so we can as well forbid the compiler from
> doing it).

If an instance of OOTA is never observed, does it exist?

In the paper, I speculated that if a physical execution of a program 
matches an abstract execution containing such a non-observed OOTA cycle, 
then it also matches another abstract execution in which the cycle 
doesn't exist.  I don't know how to prove this conjecture, though.

Alan