linux-kernel - Re: [RFC] tools/memory-model: Rule out OOTA

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <83b64af7-03b7-4dd8-900b-c57a65479ba8@rowland.harvard.edu>
Date: Thu, 9 Jan 2025 11:17:09 -0500
From: Alan Stern <stern@...land.harvard.edu>
To: Jonas Oberhauser <jonas.oberhauser@...weicloud.com>
Cc: paulmck@...nel.org, parri.andrea@...il.com, will@...nel.org,
	peterz@...radead.org, boqun.feng@...il.com, npiggin@...il.com,
	dhowells@...hat.com, j.alglave@....ac.uk, luc.maranget@...ia.fr,
	akiyks@...il.com, dlustig@...dia.com, joel@...lfernandes.org,
	urezki@...il.com, quic_neeraju@...cinc.com, frederic@...nel.org,
	linux-kernel@...r.kernel.org, lkmm@...ts.linux.dev,
	hernan.poncedeleon@...weicloud.com
Subject: Re: [RFC] tools/memory-model: Rule out OOTA

On Wed, Jan 08, 2025 at 08:22:07PM +0100, Jonas Oberhauser wrote:
> 
> 
> Am 1/8/2025 um 7:47 PM schrieb Alan Stern:
> > On Wed, Jan 08, 2025 at 06:33:05PM +0100, Jonas Oberhauser wrote:
> > > 
> > > 
> > > Am 1/7/2025 um 5:09 PM schrieb Alan Stern:
> > > > Is this really valid?  In the example above, if there were no other
> > > > references to a or b in the rest of the program, the compiler could
> > > > eliminate them entirely.
> > > 
> > > In the example above, this is not possible, because the address of a/b have
> > > escaped the function and are not deallocated before synchronization happens.
> > > Therefore the compiler must assume that a/b are accessed inside the compiler
> > > barrier.
> > 
> > I'm not quite sure what you mean by that, but if the compiler has access
> > to the entire program and can do a global analysis then it can recognize
> > cases where accesses that may seem to be live aren't really.
> 
> Even for trivial enough cases where the compiler has access to all the
> source, compiler barriers should be opaque to the compiler.
> 
> Since it is opaque,
> 
>   *a = 1;
>   compiler_barrier();
> 
> might as well be
> 
>   *a = 1;
>   *d = *a; // *d is in device memory
> 
> and so in my opinion the compiler needs to ensure that the value of *a right
> before the compiler barrier is 1.
> 
> Of course, only if the address of *a could be possibly legally known to the
> opaque code in the compiler barrier.

What do you mean by "opaque code in the compiler barrier"?  The 
compiler_barrier() instruction doesn't generate any code at all; it 
merely directs the compiler not to carry any knowledge about values 
stored in memory from one side of the barrier to the other.

Note that it does _not_ necessarily prevent the compiler from carrying 
knowledge that a memory location is unused from one side of the barrier 
to the other.

> > However, I admit this objection doesn't really apply to Linux kernel
> > programming.
> > 
> > > >   (Whether the result could count as OOTA is
> > > > open to question, but that's not the point.)  Is it not possible that a
> > > > compiler might find other ways to defeat your intentions?
> > > 
> > > The main point (which I already mentioned in the previous e-mail) is if the
> > > object is deallocated without synchronization (or never escapes the function
> > > in the first place).
> > > 
> > > And indeed, any such case renders the added rule unsound. It is in a sense
> > > unrelated to OOTA; cases where the load/store can be elided are never OOTA.
> > 
> > That is a matter of definition.  In our paper, Paul and I described
> > instances of OOTA in which all the accesses have been optimized away as
> > "trivial".
> 
> Yes, by OOTA I mean a rwdep;rfe cycle.
> 
> In the absence of data races, such a cycle can't be optimized away because
> it is created with volatile/compiler-barrier-protected accesses.

That wasn't true in the C++ context of the paper Paul and I worked on.  
Of course, C++ is not our current context here.

What I was trying to get at above is that compiler-barrier protection 
does not necessarily guarantee that non-volatile accesses can't be 
optimized away.  (However, it's probably safe for us to make such an 
assumption here.)

> It would look something like this:
> 
> Live = R & rng(po \ po ; [W] ; (po-loc \ w_barrier)) | W & dom(po \ ((po-loc
> \ w_barrier) ; [W] ; po))
> 
> let to-w = (overwrite & int) | (addr | rmb ; [Live])? ; rwdep ; ([Live] ;
> wmb)?
> 
> 
> > In any case, it seems that any approximation we can make to Live will be
> > subject to various sorts of errors.
> 
> Probably (this is certainly true for trying to approximate dependencies, for
> example), but what I know for certain is that the approximations of Live
> inside cat get more ugly the more precise they become. In the above
> definition of Live I have not included that the address must escape, nor
> that it must not be freed.
> 
> A non-local definition that suffices for OOTA would be so:
> 
> Live = R & rng(rfe) & dom(rwdep ; rfe) | W & dom(rfe)

I could live with this (although I would prefer to have more parentheses 
-- IMO it'smistake-prone to rely on the relative precedence of | and &).  

Especially if the to-w definition above were rewritten in a way that 
would be a little easier to parse and understand.

> It seems the ideal solution is to let Live be defined by the tools, which
> should keep up with or exceed the analysis done by state-of-art compilers.

I don't think it works that way in practice.  :-)

Alan