linux-kernel - Re: [PATCH] documentation: Fix two-CPU control-dependency example

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d2d26fb2-dd01-9532-0cfc-3c90bc024cb7@gmail.com>
Date:   Fri, 21 Jul 2017 06:12:56 +0900
From:   Akira Yokosawa <akiyks@...il.com>
To:     "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Cc:     Boqun Feng <boqun.feng@...il.com>, linux-doc@...r.kernel.org,
        linux-kernel@...r.kernel.org, Akira Yokosawa <akiyks@...il.com>
Subject: Re: [PATCH] documentation: Fix two-CPU control-dependency example

On 2017/07/20 09:11:52AM -0700, Paul E. McKenney wrote:
> On Thu, Jul 20, 2017 at 09:55:31PM +0900, Akira Yokosawa wrote:
>> On 2017/07/20 14:47, Paul E. McKenney wrote:
>>> On Thu, Jul 20, 2017 at 09:31:41AM +0800, Boqun Feng wrote:
>>>> On Wed, Jul 19, 2017 at 02:56:02PM -0700, Paul E. McKenney wrote:
>>>>> On Thu, Jul 20, 2017 at 06:33:26AM +0900, Akira Yokosawa wrote:
>>>>>> On 2017/07/20 2:43, Paul E. McKenney wrote:
>>>>>>> On Mon, Jul 17, 2017 at 05:24:42PM +0900, Akira Yokosawa wrote:
>>>>>>>> >From b798b9b631e237d285aa8699da00bfb8ced33bea Mon Sep 17 00:00:00 2001
>>>>>>>> From: Akira Yokosawa <akiyks@...il.com>
>>>>>>>> Date: Mon, 17 Jul 2017 16:25:33 +0900
>>>>>>>> Subject: [PATCH] documentation: Fix two-CPU control-dependency example
>>>>>>>>
>>>>>>>> In commit 5646f7acc95f ("memory-barriers: Fix control-ordering
>>>>>>>> no-transitivity example"), the operator in "if" statement of
>>>>>>>> the two-CPU example was modified from ">=" to ">".
>>>>>>>> Now the example misses the point because there is no party
>>>>>>>> who will modify "x" nor "y". So each CPU performs only the
>>>>>>>> READ_ONCE().
>>>>>>>>
>>>>>>>> The point of this example is to use control dependency for ordering,
>>>>>>>> and the WRITE_ONCE() should always be executed.
>>>>>>>>
>>>>>>>> So it was correct prior to the above mentioned commit. Partial
>>>>>>>> revert of the commit (with context adjustments regarding other
>>>>>>>> changes thereafter) restores the point.
>>>>>>>>
>>>>>>>> Note that the three-CPU example demonstrating the lack of
>>>>>>>> transitivity stands regardless of this partial revert.
>>>>>>>>
>>>>>>>> Signed-off-by: Akira Yokosawa <akiyks@...il.com>
>>>>>>>
>>>>>>> Hello, Akira,
>>>>>>>
>>>>>>> You are quite right that many compilers would generate straightforward
>>>>>>> code for the code fragment below, and in that case, the assertion could
>>>>>>> never trigger due to either TSO or control dependencies, depending on
>>>>>>> the architecture this was running on.
>>>>>>>
>>>>>>> However, if the compiler was too smart for our good, it could figure
>>>>>>> out that "x" and "y" only take on the values zero and one, so that
>>>>>>> the "if" would always be taken.  At that point, the compiler could
>>>>>>> simply ignore the "if" with the result shown below.
>>>>>>>
>>>>>>>> ---
>>>>>>>>  Documentation/memory-barriers.txt | 2 +-
>>>>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>>>
>>>>>>>> diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
>>>>>>>> index c4ddfcd..c1ebe99 100644
>>>>>>>> --- a/Documentation/memory-barriers.txt
>>>>>>>> +++ b/Documentation/memory-barriers.txt
>>>>>>>> @@ -851,7 +851,7 @@ demonstrated by two related examples, with the initial values of
>>>>>>>>  	CPU 0                     CPU 1
>>>>>>>>  	=======================   =======================
>>>>>>>>  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);
>>>>>>>> -	if (r1 > 0)               if (r2 > 0)
>>>>>>>> +	if (r1 >= 0)              if (r2 >= 0)
>>>>>>>>  	  WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);
>>>>>>>>
>>>>>>>>  	assert(!(r1 == 1 && r2 == 1));
>>>>>>>
>>>>>>> Original program:
>>>>>>>
>>>>>>>  	CPU 0                     CPU 1
>>>>>>>  	=======================   =======================
>>>>>>>  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);
>>>>>>> 	if (r1 >= 0)              if (r2 >= 0)
>>>>>>>  	  WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);
>>>>>>>
>>>>>>>  	assert(!(r1 == 1 && r2 == 1));
>>>>>>>
>>>>>>> Enthusiastically optimized program:
>>>>>>>
>>>>>>>  	CPU 0                     CPU 1
>>>>>>>  	=======================   =======================
>>>>>>>  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);
>>>>>>>  	WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);
>>>>>>>
>>>>>>>  	assert(!(r1 == 1 && r2 == 1));
>>>>>>>
>>>>>>> This optimized version could trigger the assertion.
>>>>>>>
>>>>>>> So we do need to stick with the ">" comparison.
>>>>>>
>>>>>> Well but,
>>>>>>
>>>>>> Current example:
>>>>>>
>>>>>>  	CPU 0                     CPU 1
>>>>>>  	=======================   =======================
>>>>>>  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);
>>>>>> 	if (r1 > 0)               if (r2 > 0)
>>>>>>  	  WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);
>>>>>>
>>>>>> 	assert(!(r1 == 1 && r2 == 1));
>>>>>>
>>>>>> Such a clever compiler might be able to prove that "x" and "y"
>>>>>> are never modified and end up in the following:
>>>>>>
>>>>
>>>> Hi Akira,
>>>>
>>>> I wouldn't call that compiler a clever one, it's a broken one ;-)
>>>>
>>>> So here is the thing: READ_ONCE() is a *volatile* load, which means the
>>>> compiler has to generate code that actually does a load, so the values
>>>> of r1 and r2 depend on the loads, therefore, a sane compiler will not 
>>>> optimize the if()s out because the volatile semantics of READ_ONCE().
>>>> Otherwise, I think we have a lot more to worry about than this case.
>>>>
>>>>>>  	CPU 0                     CPU 1
>>>>>>  	=======================   =======================
>>>>>>  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);
>>>>>>
>>>>>> 	assert(!(r1 == 1 && r2 == 1));
>>>>>>
>>>>>> This means it is impossible to describe this example in C,
>>>>>> doesn't it?
>>>>>
>>>>> That is a matter of some debate, but it has gotten increasingly more
>>>>> difficult to get C to do one's bidding over the decades.
>>>>>
>>>>>> What am I missing here?
>>>>>
>>>>> The compiler has to work harder in your example case, so it is probably
>>>>> just a matter of time.  In the original with ">=", all the compiler needed
>>>>> to do was look at all the assignments and the initial value.  In the
>>>>> original, it also had to do reasoning based on control dependencies
>>>>> (which are not yet part of the C standard).
>>>>>
>>>>> But yes, the degree to which a compiler can optimize atomics is a matter
>>>>> of some debate.  Here is a proposal to let the developer choose:
>>>>>
>>>>
>>>> Hi Paul,
>>>>
>>>> I know the compiler could optimize atomics in very interesting ways, but
>>>> this case is about volatile, so I guess our case is still fine? ;-)
>>>
>>> Hello, Boqun,
>>>
>>> When I asked that question, the answer I got was "the compiler must
>>> emit the load instruction, but is under no obligation to actually use the
>>> value loaded".
>>
>> So, it sounds like the following description found in memory-barriers.txt
>> just above the example of lack of transitivity:
>>
>>> However, stores are not speculated.  This means that ordering -is- provided
>>> for load-store control dependencies, as in the following example:
>>>
>>> 	q = READ_ONCE(a);
>>> 	if (q) {
>>> 		WRITE_ONCE(b, 1);
>>> 	}
>>>
>>> Control dependencies pair normally with other types of barriers.
>>> That said, please note that neither READ_ONCE() nor WRITE_ONCE()
>>> are optional! Without the READ_ONCE(), the compiler might combine the
>>> load from 'a' with other loads from 'a'.  Without the WRITE_ONCE(),
>>> the compiler might combine the store to 'b' with other stores to 'b'.
>>> Either can result in highly counterintuitive effects on ordering.
>>>
>>> Worse yet, if the compiler is able to prove (say) that the value of
>>> variable 'a' is always non-zero, it would be well within its rights
>>> to optimize the original example by eliminating the "if" statement
>>> as follows:
>>>
>>> 	q = a;
>>> 	b = 1;  /* BUG: Compiler and CPU can both reorder!!! */
>>>
>>> So don't leave out the READ_ONCE().
>>
>> does not stand if the answer needs to be taken seriously.
>> The READ_ONCE() is not good enough to prevent the "if" statement
>> from being eliminated.
>>
>> Hmm... If we do need to care about such an extreme optimization,
>> we should not rely on any control dependency in C source code
>> at least until the compiler gets educated about control dependency.
>>
>> Is there some reasonable compromise?
> 
> Right now, what we have are the volatile loads such as from READ_ONCE()
> and WRITE_ONCE().  My defense of them has been that they need to properly
> handle MMIO.  But not all compiler writers agree with me, so we should
> program at least a bit defensively.  Though we should probably also
> be writing tests to verify that the compiler is doing the right thing,
> and those litmus tests should probably include your ">=" litmus test.
> But we should not encourage developers to rely on it quite yet.
> 
> My current position is that compiler writers are currently unlikely to
> make their compilers do deep reasoning around the memory model, and that
> they are currently unlikely to do cross-translation-unit assigned-value
> analysis (though LTO might be changing that).  But given a static atomic
> that is visible only within a translation unit, I would expect them
> to do value-range analysis.  Hence my reluctance to present the ">="
> variant as a pattern to follow.

So if we declare "x" and "y" are not static (not file local),
can the ">=" variant be exempted from the possible optimization?

For example:

	extern int x, y;

	CPU 0                     CPU 1
	=======================   =======================
	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);
	if (r1 >= 0)              if (r2 >= 0)
	  WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);

	assert(!(r1 == 1 && r2 == 1));

This looks safe to me.

Thoughts?

            Thanks, Akira

> 
> 							Thanx, Paul
> 
> 
>>            Thanks, Akira
>>
>>>
>>> I don't happen to like that answer, by the way.  ;-)
>>>
>>> 							Thanx, Paul
>>>
>>>>> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0062r1.html
>>>>>
>>>>
>>>> Great material to wake up mind in the morning! Thanks.
>>>>
>>>> Regards,
>>>> Boqun
>>>>
>>>>> What are your thoughts on this?
>>>>>
>>>>> 							Thanx, Paul
>>>>>
>>>>>>             Thanks, Akira
>>>>>>
>>>>>>> That said, I very much welcome critical reviews of memory-barriers.txt,
>>>>>>> so please do feel free to continue doing that!
>>>>>>>
>>>>>>> 							Thanx, Paul
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>
> 
>