linux-kernel - Re: [PATCH v3] tools/memory-model: Make ppo a subrelation of po

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <dd87369d-825f-e932-dd86-8e46e92d1a7e@huaweicloud.com>
Date:   Mon, 27 Feb 2023 15:03:16 +0100
From:   Jonas Oberhauser <jonas.oberhauser@...weicloud.com>
To:     Alan Stern <stern@...land.harvard.edu>
Cc:     Boqun Feng <boqun.feng@...il.com>,
        "Paul E. McKenney" <paulmck@...nel.org>,
        Jonas Oberhauser <jonas.oberhauser@...wei.com>,
        parri.andrea@...il.com, will@...nel.org, peterz@...radead.org,
        npiggin@...il.com, dhowells@...hat.com, j.alglave@....ac.uk,
        luc.maranget@...ia.fr, akiyks@...il.com, dlustig@...dia.com,
        joel@...lfernandes.org, urezki@...il.com, quic_neeraju@...cinc.com,
        frederic@...nel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3] tools/memory-model: Make ppo a subrelation of po



On 2/26/2023 5:51 PM, Alan Stern wrote:
> On Sun, Feb 26, 2023 at 12:17:31PM +0100, Jonas Oberhauser wrote:
>>
>> On 2/26/2023 4:30 AM, Alan Stern wrote:
>>> On Sat, Feb 25, 2023 at 07:09:05PM -0800, Boqun Feng wrote:
>>>> On Sat, Feb 25, 2023 at 09:29:51PM -0500, Alan Stern wrote:
>>>>> On Sat, Feb 25, 2023 at 05:01:10PM -0800, Paul E. McKenney wrote:
>>>>>> A few other oddities:
>>>>>>
>>>>>> litmus/auto/C-LB-Lww+R-OC.litmus
>>>>>>
>>>>>> 	Both versions flag a data race, which I am not seeing.	It appears
>>>>>> 	to me that P1's store to u0 cannot happen unless P0's store
>>>>>> 	has completed.  So what am I missing here?
>>>>> The LKMM doesn't believe that a control or data dependency orders a
>>>>> plain write after a marked read.  Hence in this test it thinks that P1's
>>>>> store to u0 can happen before the load of x1.  I don't remember why we
>>>>> did it this way -- probably we just wanted to minimize the restrictions
>>>>> on when plain accesses can execute.  (I do remember the reason for
>>>>> making address dependencies induce order; it was so RCU would work.)
>>>>>
>>>> Because plain store can be optimzed as an "store only if not equal"?
>>>> As the following sentenses in the explanations.txt:
>>>>
>>>> 	The need to distinguish between r- and w-bounding raises yet another
>>>> 	issue.  When the source code contains a plain store, the compiler is
>>>> 	allowed to put plain loads of the same location into the object code.
>>>> 	For example, given the source code:
>>>>
>>>> 		x = 1;
>>>>
>>>> 	the compiler is theoretically allowed to generate object code that
>>>> 	looks like:
>>>>
>>>> 		if (x != 1)
>>>> 			x = 1;
>>>>
>>>> 	thereby adding a load (and possibly replacing the store entirely).
>>>> 	For this reason, whenever the LKMM requires a plain store to be
>>>> 	w-pre-bounded or w-post-bounded by a marked access, it also requires
>>>> 	the store to be r-pre-bounded or r-post-bounded, so as to handle cases
>>>> 	where the compiler adds a load.
>>> Good guess; maybe that was the reason.  [...]
>>> So perhaps the original reason is not valid now
>>> that the memory model explicitly includes tests for stores being
>>> r-pre/post-bounded.
>>>
>>> Alan
>> I agree, I think you could relax that condition.
> Here's a related question to think about.  Suppose a compiler does make
> this change, adding a load-and-test in front of a store.  Can that load
> cause a data race?
>
> Normally I'd say no, because compilers aren't allowed to create data
> races where one didn't already exist.

I don't think that's completely true. At least N1570 says

      Transformations that introduce a speculative read of a potentially shared memory location may
      not preserve the semantics of the program as defined in this standard, since they potentially introduce a data
      race. However, they are typically valid in the context of an optimizing compiler that targets a specific
      machine with well-defined semantics for data races.

In fact I vaguely recall certain optimizations that do introduce data races, like load hoisting.
The issue is that some optimizations are only valid in the absence of data races, and therefore these optimizations and the optimizations that introduce data races can't be used on the same portion of code at the same time.

https://cs.nju.edu.cn/hongjin/teaching/concurrency/03_C11MM_ViktorVafeiadis.pdf

I think in generall it's not about whether the compiler introduces races or not, but whether those races create new behaviors that you don't see in LKMM.


>    But that restriction is part of
> the C/C++ standard, and what we consider to be a data race differs from
> what the standard considers.
>
> So what's the answer?  Is the compiler allowed to translate:
>
> 	r1 = READ_ONCE(*x);
> 	if (r1)
> 		*y = 1;
>
> into something resembling:
>
> 	r1 = READ_ONCE(*x);
> 	rtemp = *y;
> 	if (r1) {
> 		if (rtemp != 1)
> 			*y = 1;
> 	}
>
> (Note that whether the load to rtemp occurs inside the "if (r1)"
> conditional or not makes no difference; either way the CPU can execute
> it before testing the condition.  Even before reading the value of *x.)
>
> _If_ we assume that these manufactured loads can never cause a data race
> then it should be safe to remove the r-pre/post-bounded tests for plain
> writes.

Note that I don't want to remove the r-pre/post-bounded tests.
What I agreed to is that the restriction to only addr for plain writes 
is an overly conservative "r-pre/post-bounded-style" test which is made 
redundant by the existence of the actual r-pre/post-bounded test.

>
> But what if rtemp reads from a plain write that was torn, and the
> intermediate value it observes happens to be 1, even though neither the
> initial nor the final value of *y was 1?

Of course, and you don't even need a torn write to cause problems 
without requiring r-bounding.
For example, if the other side does *y=1; *y=2; mb(); *x=&y;, then 
r1==&x seems to imply that y always end up as ==1 from a 
compiler-oblivious perspective, but because of the data race it could be 
==2 instead.


>> Note there's also rw-xbstar (used with fr) which doesn't check for
>> r-pre-bounded, but it should be ok. That's because only reads would be
>> unordered, as a result the read (in the if (x != ..) x=..) should provide
>> the correct value. The store would be issued as necessary, and the issued
>> store would still be ordered correctly w.r.t the read.
> That isn't the reason I left r-pre-bounded out from rw-xbstar.  If the
> write gets changed to a read there's no need for rw-xbstar to check
> r-pre-bounded, because then rw-race would be comparing a read with
> another read (instead of with a write) and so there would be no
> possibility of a race in any case.

That is the first part of my explanation (only reads would be unordered) 
but I don't think it's sufficient in general.
Imagine a hypothetical memory model with a release fence that upgrades 
the next memory operation only (and only stores) to release (e.g., to 
save some opcode design space you do weird_release_fence;str x1 x2 
instead of stlr x1 x2).
Then in the message passing pattern

T1 {
    u = a;
    release(&x, 1);
}

T2 {
   t = READ_ONCE(&x);
   weird_release_fence;
   a = 1;
}

where T2 is changed by the compiler to

T2 {
   t = READ_ONCE(&x);
   weird_release_fence();
   if (a!=1) a = 1;
}

In the specific executions where t==1, there wouldn't be a data race 
when just considering the equivalent of rw-xbstar, but u==1 and t==1 
would be possible (even though it might not seem so from the first example).

Of course in LKMM there's no such fence, but I think to make the 
argument complete you still need to go through every case that provides 
the w-pre-bounding and make sure it still provides the necessary order 
in the compiler-generated version. (or you can try a more complicated 
argument of the form "there would be another execution of the same 
program that would have a data race", which works at least for this 
example, not sure in general)

Have fun,
jonas