linux-kernel - Re: [RFC] tools/memory-model: Rule out OOTA

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c2ae9bca-8526-425e-b9b5-135004ad59ad@rowland.harvard.edu>
Date: Fri, 10 Jan 2025 16:51:52 -0500
From: Alan Stern <stern@...land.harvard.edu>
To: Jonas Oberhauser <jonas.oberhauser@...weicloud.com>
Cc: paulmck@...nel.org, parri.andrea@...il.com, will@...nel.org,
	peterz@...radead.org, boqun.feng@...il.com, npiggin@...il.com,
	dhowells@...hat.com, j.alglave@....ac.uk, luc.maranget@...ia.fr,
	akiyks@...il.com, dlustig@...dia.com, joel@...lfernandes.org,
	urezki@...il.com, quic_neeraju@...cinc.com, frederic@...nel.org,
	linux-kernel@...r.kernel.org, lkmm@...ts.linux.dev,
	hernan.poncedeleon@...weicloud.com
Subject: Re: [RFC] tools/memory-model: Rule out OOTA

On Fri, Jan 10, 2025 at 01:21:32PM +0100, Jonas Oberhauser wrote:
> 
> 
> Am 1/10/2025 um 4:12 AM schrieb Alan Stern:
> > On Thu, Jan 09, 2025 at 09:09:00PM +0100, Jonas Oberhauser wrote:
> > 
> > > > > So for example, in
> > > > > 
> > > > >     int a = 1;
> > > > >     barrier();
> > > > >     a = 2;
> > > > >     //...
> > > > > 
> > > > > the compiler does not know how the code inside barrier() accesses memory,
> > > > > including volatile memory.
> > > > 
> > > > I would say rather that the compiler does not know that the values
> > > > stored in memory are the same before and after the barrier().
> > > > 
> > > > Even the
> > > > values of local variables whose addresses have not been exported.
> > > 
> > > No, this is not true. I used to think so too until a short while ago.
> > > 
> > > But if you look at the output of gcc -o3 you will see that it does happily
> > > remove `a` in this function.
> > 
> > Isn't that consistent with what I said?
> 
> 
> Ok, after careful reading, I think there are two assumptions you have that I
> think are not true, but my example is only inconsistent with exactly one of
> them being not true, not with both of them being not true:
> 
> 1) the barrier only tells the compiler that it may *change* the value of
> memory locations. I believe it also tells the compiler that it may *read*
> the value of memory locations.
> 2) the barrier also talks about the values of local variables whose
> addresses have not been exported. I do not believe this is the case.

I checked the GCC manual.  You are right about 1); the compiler is 
required to guarantee that the contents of memory before the barrier are 
fully up-to-date (no dirty values remaining in registers or 
temporaries).

2) isn't so clear.  If a local variable's address is never computed then 
the compiler might put the variable in a register, in which case the 
barrier would not clobber it.  On the other hand, if the variable's 
address is computed somewhere (even if it isn't exported) then the 
variable can't be kept in a register and so it is subject to the 
barrier's effects.

The manual says:

	Using the ‘"memory"’ clobber effectively forms a read/write 
	memory barrier for the compiler.

Not a word about what happens if a variable has the "register" storage 
class, for example, and so might never be stored in memory at all.  But 
this leaves the programmer with _no_ way to specify a memory barrier for 
such variables!

Of course, the fact that these variables cannot be exposed to outside 
code does mitigate the problem...

> For what gcc guarantees, the manual says: "add memory to the list of
> clobbered registers. This will cause GCC to not keep memory values cached in
> registers across the assembler instruction", i.e., it needs to flush the
> value from the register to actual memory.

Yes, GCC will write back dirty values from registers.  But not because 
the cached values will become invalid (in fact, the cached values might 
not even be used after the barrier).  Rather, because the compiler is 
required to assume that the assembler code in the barrier might access 
arbitrary memory locations -- even if that code is empty.

> > > > Question: Can the compiler assume that no other threads access a between
> > > > the two stores, on the grounds that this would be a data race?  I'd
> > > > guess that it can't make that assumption, but it would be nice to know
> > > > for sure.
> > > 
> > > It can not make the assumption if &a has escaped. In that case, barrier()
> > > could be so:
> > > 
> > > barrier(){
> > >    store_release(OTHER_THREAD_PLEASE_MODIFY,&a);
> > > 
> > >    while (! load_acquire(OTHER_THREAD_IS_DONE));
> > > }
> > > 
> > > with another thread doing
> > > 
> > >    while (! load_acquire(OTHER_THREAD_PLEASE_MODIFY)) yield();
> > >    *OTHER_THREAD_PLEASE_MODIFY ++;
> > >    store_release(OTHER_THREAD_IS_DONE, 1);

Okay, yes, the compiler can't know whether the assembler code will do 
this.  But as far as I know, there is no specification about whether 
inline assembler can synchronize with code in another thread (in the 
sense used by the C++ memory model) and therefore create a 
happens-before link.  Language specifications tend to ignore issues 
like inline assembler.

Does this give the compiler license to believe no such link can exist 
and therefore accesses to these non-atomic variables by another thread 
concurrent with the barrier would be data races?

In the end maybe this doesn't matter.

> > > If herd7 would generate edges for semantic dependencies instead of for its
> > > version of syntactic dependencies, then rwdep is the answer.
> > 
> > That statement is meaningless (or at least, impossible to implement)
> > because there is no widely agreed-upon formal definition for semantic
> > dependency.
> 
> Yes, which also means that a 100% correct end-to-end solution (herd + cat +
> ... ?) is currently not implementable.
> 
> But we can still break the problem into two halves, one which is 100% solved
> inside the cat file, and one which is the responsibility of herd7 and
> currently not solved (or 100% satisfactorily solvable).

I believe that Luc and the other people involved with herd7 take the 
opposite point of view: herd7 is intended to do the "easy" analysis 
involving only straightforward code parsing, leaving the "hard" 
conceptual parts to the user-supplied .cat file.

> The advantage being that if we read the cat file as a mathematical
> definition, we can at least on paper argue 100% correctly about code for the
> cases where we either can figure out on paper what the semantic dependencies
> are, or where we at least just say "with relation to current compilers, we
> know what the semantically preserved dependencies are", even if herd7 or
> other tools lags behind in one or both.
> 
> After all, herd7 is just a (useful) automation tool for reasoning about
> LKMM, which has its limitations (scalability, a specific definition of
> dependencies, limited C subset...).

I still think we should not attempt any sort of formalization of 
semantic dependency here.

> > > Given that we can not define dep inside the cat model, one may as well
> > > define it as rwdep;rfe with the intended meaning of the dependencies being
> > > the semantic ones; then it is an inaccuracy of herd7 that it does not
> > > provide the proper dependencies.
> > 
> > Perhaps so.  LKMM does include other features which the compiler can
> > defeat if the programmer isn't sufficiently careful.
> 
> How many of these are due to herd7's limitations vs. in the cat file?

Important limitations are present in both.

> > > > In the paper, I speculated that if a physical execution of a program
> > > > matches an abstract execution containing such a non-observed OOTA cycle,
> > > > then it also matches another abstract execution in which the cycle
> > > > doesn't exist.  I don't know how to prove this conjecture, though.
> > > 
> > > Yes, that also makes sense.
> > > 
> > > Note that this speculation does not hold in the current LKMM though. In the
> > > Litmus test I shared in the opening e-mail, where the outcome 0:r1=1 /\
> > > 1:r1=1 is only possible with an OOTA (even though the values from the OOTA
> > > are never used anywhere).
> > 
> > If the fact that the outcome 0:r1=1 /\ 1:r1=1 has occurred is proof that
> > there was OOTA, then the OOTA cycle _is_ observed, albeit indirectly --
> > at least, in the sense that I intended.  (The situation mentioned in the
> > paper is better described as an execution where the compiler has elided
> > all the accesses in the OOTA cycle.)
> 
> I'm not sure that sense makes a lot of sense to me.

Here's an example illustrating what I had in mind.  Imagine that all the 
accesses here are C++-style relaxed atomic (i.e., not volatile and also 
not subject to data races):

P0(int *a, int *b) {
	int r0 = *a;
	*b = r0;
	*b = 2;
	// r0 not used again
}

P1(int *a, int *b) {
	int r1 = *b;
	*a = r1;
	*a = 2;
	// r1 not used again
}

The compiler could eliminate the r0 and r1 accesses entirely, leaving 
just:

P0(int *a, int *b) {
	*b = 2;
}

P1(int *a, int *b) {
	*a = 2;
}

An execution of the corresponding machine code would then be compatible 
with an abstract execution of the source code in which both r0 and r1 
get set to 42 (OOTA).  But it would also be compatible with an abstract 
execution in which both r0 and r1 are 0, so it doesn't make sense to say 
that the hardware execution is, or might be, an instance of OOTA.

> But it does make the proof of your claim totally trivial. If there is no
> other OOTA-free execution with the same observable behavior, then it is
> proof that the OOTA happened, so the OOTA was observed.
> So by contraposition any non-observed OOTA has an OOTA-free execution with
> the same observable behavior.
> 
> The sense in which I would define observed is more along the lines of "there
> is an observable side effect (such as store to volatile location) which has
> a semantic dependency on a load that reads from one of the stores in the
> OOTA cycle".

Yes, I can see that from your proposed definition of Live.


I'm afraid we've wandered off the point of this email thread, however... 

Getting back to the original point, why don't you rewrite your patch as 
discussed earlier and describe it as an attempt to add ordering for 
important situations involving plain accesses that the LKMM currently 
does not handle?  In other words, leave out as far as possible any 
mention of OOTA or semantic dependency.

Alan