[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <06854f97-a671-426e-8c73-78214da8fbf5@huaweicloud.com>
Date: Wed, 8 Jan 2025 20:22:07 +0100
From: Jonas Oberhauser <jonas.oberhauser@...weicloud.com>
To: Alan Stern <stern@...land.harvard.edu>
Cc: paulmck@...nel.org, parri.andrea@...il.com, will@...nel.org,
peterz@...radead.org, boqun.feng@...il.com, npiggin@...il.com,
dhowells@...hat.com, j.alglave@....ac.uk, luc.maranget@...ia.fr,
akiyks@...il.com, dlustig@...dia.com, joel@...lfernandes.org,
urezki@...il.com, quic_neeraju@...cinc.com, frederic@...nel.org,
linux-kernel@...r.kernel.org, lkmm@...ts.linux.dev,
hernan.poncedeleon@...weicloud.com
Subject: Re: [RFC] tools/memory-model: Rule out OOTA
Am 1/8/2025 um 7:47 PM schrieb Alan Stern:
> On Wed, Jan 08, 2025 at 06:33:05PM +0100, Jonas Oberhauser wrote:
>>
>>
>> Am 1/7/2025 um 5:09 PM schrieb Alan Stern:
>>> Is this really valid? In the example above, if there were no other
>>> references to a or b in the rest of the program, the compiler could
>>> eliminate them entirely.
>>
>> In the example above, this is not possible, because the address of a/b have
>> escaped the function and are not deallocated before synchronization happens.
>> Therefore the compiler must assume that a/b are accessed inside the compiler
>> barrier.
>
> I'm not quite sure what you mean by that, but if the compiler has access
> to the entire program and can do a global analysis then it can recognize
> cases where accesses that may seem to be live aren't really.
Even for trivial enough cases where the compiler has access to all the
source, compiler barriers should be opaque to the compiler.
Since it is opaque,
*a = 1;
compiler_barrier();
might as well be
*a = 1;
*d = *a; // *d is in device memory
and so in my opinion the compiler needs to ensure that the value of *a
right before the compiler barrier is 1.
Of course, only if the address of *a could be possibly legally known to
the opaque code in the compiler barrier.
>
> However, I admit this objection doesn't really apply to Linux kernel
> programming.
>
>>> (Whether the result could count as OOTA is
>>> open to question, but that's not the point.) Is it not possible that a
>>> compiler might find other ways to defeat your intentions?
>>
>> The main point (which I already mentioned in the previous e-mail) is if the
>> object is deallocated without synchronization (or never escapes the function
>> in the first place).
>>
>> And indeed, any such case renders the added rule unsound. It is in a sense
>> unrelated to OOTA; cases where the load/store can be elided are never OOTA.
>
> That is a matter of definition. In our paper, Paul and I described
> instances of OOTA in which all the accesses have been optimized away as
> "trivial".
Yes, by OOTA I mean a rwdep;rfe cycle.
In the absence of data races, such a cycle can't be optimized away
because it is created with volatile/compiler-barrier-protected accesses.
>> If an access interacts with an access of another thread (by reading from it
>> or being read from it), it must be live.
>
> This is the sort of approximation I'm a little uncomfortable with. It
> would be better to say that a store which is read from by a live load
> must be live. I don't see why a load which reads from a live store has
> to be live.
You are right, and I was careless.
All we need is that a store that is read externally by a live load is
live, and that a load that reads from an external store and has its
value semantically depended-on by a live store is live.
>> The formulation in the patch is just based on a complicated and close but
>> imperfect approximation of Live.
>
> Maybe you can reformulate the patch to make this more explicit.
It would look something like this:
Live = R & rng(po \ po ; [W] ; (po-loc \ w_barrier)) | W & dom(po \
((po-loc \ w_barrier) ; [W] ; po))
let to-w = (overwrite & int) | (addr | rmb ; [Live])? ; rwdep ; ([Live]
; wmb)?
> In any case, it seems that any approximation we can make to Live will be
> subject to various sorts of errors.
Probably (this is certainly true for trying to approximate dependencies,
for example), but what I know for certain is that the approximations of
Live inside cat get more ugly the more precise they become. In the above
definition of Live I have not included that the address must escape, nor
that it must not be freed.
A non-local definition that suffices for OOTA would be so:
Live = R & rng(rfe) & dom(rwdep ; rfe) | W & dom(rfe)
It seems the ideal solution is to let Live be defined by the tools,
which should keep up with or exceed the analysis done by state-of-art
compilers.
Best wishes,
jonas
Powered by blists - more mailing lists