linux-kernel - Re: [PATCH] documentation: Fix two-CPU control-dependency example

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20170720230714.GA3730@linux.vnet.ibm.com>
Date:   Thu, 20 Jul 2017 16:07:14 -0700
From:   "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:     Akira Yokosawa <akiyks@...il.com>
Cc:     Boqun Feng <boqun.feng@...il.com>, linux-doc@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] documentation: Fix two-CPU control-dependency example

On Fri, Jul 21, 2017 at 07:52:03AM +0900, Akira Yokosawa wrote:
> On 2017/07/20 14:42:34 -0700, Paul E. McKenney wrote:
> > On Fri, Jul 21, 2017 at 06:12:56AM +0900, Akira Yokosawa wrote:
> >> On 2017/07/20 09:11:52AM -0700, Paul E. McKenney wrote:
> >>> On Thu, Jul 20, 2017 at 09:55:31PM +0900, Akira Yokosawa wrote:
> >>>> On 2017/07/20 14:47, Paul E. McKenney wrote:
> >>>>> On Thu, Jul 20, 2017 at 09:31:41AM +0800, Boqun Feng wrote:
> >>>>>> On Wed, Jul 19, 2017 at 02:56:02PM -0700, Paul E. McKenney wrote:
> >>>>>>> On Thu, Jul 20, 2017 at 06:33:26AM +0900, Akira Yokosawa wrote:
> >>>>>>>> On 2017/07/20 2:43, Paul E. McKenney wrote:
> >>>>>>>>> On Mon, Jul 17, 2017 at 05:24:42PM +0900, Akira Yokosawa wrote:
> >>>>>>>>>> >From b798b9b631e237d285aa8699da00bfb8ced33bea Mon Sep 17 00:00:00 2001
> >>>>>>>>>> From: Akira Yokosawa <akiyks@...il.com>
> >>>>>>>>>> Date: Mon, 17 Jul 2017 16:25:33 +0900
> >>>>>>>>>> Subject: [PATCH] documentation: Fix two-CPU control-dependency example
> >>>>>>>>>>
> >>>>>>>>>> In commit 5646f7acc95f ("memory-barriers: Fix control-ordering
> >>>>>>>>>> no-transitivity example"), the operator in "if" statement of
> >>>>>>>>>> the two-CPU example was modified from ">=" to ">".
> >>>>>>>>>> Now the example misses the point because there is no party
> >>>>>>>>>> who will modify "x" nor "y". So each CPU performs only the
> >>>>>>>>>> READ_ONCE().
> >>>>>>>>>>
> >>>>>>>>>> The point of this example is to use control dependency for ordering,
> >>>>>>>>>> and the WRITE_ONCE() should always be executed.
> >>>>>>>>>>
> >>>>>>>>>> So it was correct prior to the above mentioned commit. Partial
> >>>>>>>>>> revert of the commit (with context adjustments regarding other
> >>>>>>>>>> changes thereafter) restores the point.
> >>>>>>>>>>
> >>>>>>>>>> Note that the three-CPU example demonstrating the lack of
> >>>>>>>>>> transitivity stands regardless of this partial revert.
> >>>>>>>>>>
> >>>>>>>>>> Signed-off-by: Akira Yokosawa <akiyks@...il.com>
> >>>>>>>>>
> >>>>>>>>> Hello, Akira,
> >>>>>>>>>
> >>>>>>>>> You are quite right that many compilers would generate straightforward
> >>>>>>>>> code for the code fragment below, and in that case, the assertion could
> >>>>>>>>> never trigger due to either TSO or control dependencies, depending on
> >>>>>>>>> the architecture this was running on.
> >>>>>>>>>
> >>>>>>>>> However, if the compiler was too smart for our good, it could figure
> >>>>>>>>> out that "x" and "y" only take on the values zero and one, so that
> >>>>>>>>> the "if" would always be taken.  At that point, the compiler could
> >>>>>>>>> simply ignore the "if" with the result shown below.
> >>>>>>>>>
> >>>>>>>>>> ---
> >>>>>>>>>>  Documentation/memory-barriers.txt | 2 +-
> >>>>>>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>>>>>>>
> >>>>>>>>>> diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
> >>>>>>>>>> index c4ddfcd..c1ebe99 100644
> >>>>>>>>>> --- a/Documentation/memory-barriers.txt
> >>>>>>>>>> +++ b/Documentation/memory-barriers.txt
> >>>>>>>>>> @@ -851,7 +851,7 @@ demonstrated by two related examples, with the initial values of
> >>>>>>>>>>  	CPU 0                     CPU 1
> >>>>>>>>>>  	=======================   =======================
> >>>>>>>>>>  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);
> >>>>>>>>>> -	if (r1 > 0)               if (r2 > 0)
> >>>>>>>>>> +	if (r1 >= 0)              if (r2 >= 0)
> >>>>>>>>>>  	  WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);
> >>>>>>>>>>
> >>>>>>>>>>  	assert(!(r1 == 1 && r2 == 1));
> >>>>>>>>>
> >>>>>>>>> Original program:
> >>>>>>>>>
> >>>>>>>>>  	CPU 0                     CPU 1
> >>>>>>>>>  	=======================   =======================
> >>>>>>>>>  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);
> >>>>>>>>> 	if (r1 >= 0)              if (r2 >= 0)
> >>>>>>>>>  	  WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);
> >>>>>>>>>
> >>>>>>>>>  	assert(!(r1 == 1 && r2 == 1));
> >>>>>>>>>
> >>>>>>>>> Enthusiastically optimized program:
> >>>>>>>>>
> >>>>>>>>>  	CPU 0                     CPU 1
> >>>>>>>>>  	=======================   =======================
> >>>>>>>>>  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);
> >>>>>>>>>  	WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);
> >>>>>>>>>
> >>>>>>>>>  	assert(!(r1 == 1 && r2 == 1));
> >>>>>>>>>
> >>>>>>>>> This optimized version could trigger the assertion.
> >>>>>>>>>
> >>>>>>>>> So we do need to stick with the ">" comparison.
> >>>>>>>>
> >>>>>>>> Well but,
> >>>>>>>>
> >>>>>>>> Current example:
> >>>>>>>>
> >>>>>>>>  	CPU 0                     CPU 1
> >>>>>>>>  	=======================   =======================
> >>>>>>>>  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);
> >>>>>>>> 	if (r1 > 0)               if (r2 > 0)
> >>>>>>>>  	  WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);
> >>>>>>>>
> >>>>>>>> 	assert(!(r1 == 1 && r2 == 1));
> >>>>>>>>
> >>>>>>>> Such a clever compiler might be able to prove that "x" and "y"
> >>>>>>>> are never modified and end up in the following:
> >>>>>>>>
> >>>>>>
> >>>>>> Hi Akira,
> >>>>>>
> >>>>>> I wouldn't call that compiler a clever one, it's a broken one ;-)
> >>>>>>
> >>>>>> So here is the thing: READ_ONCE() is a *volatile* load, which means the
> >>>>>> compiler has to generate code that actually does a load, so the values
> >>>>>> of r1 and r2 depend on the loads, therefore, a sane compiler will not 
> >>>>>> optimize the if()s out because the volatile semantics of READ_ONCE().
> >>>>>> Otherwise, I think we have a lot more to worry about than this case.
> >>>>>>
> >>>>>>>>  	CPU 0                     CPU 1
> >>>>>>>>  	=======================   =======================
> >>>>>>>>  	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);
> >>>>>>>>
> >>>>>>>> 	assert(!(r1 == 1 && r2 == 1));
> >>>>>>>>
> >>>>>>>> This means it is impossible to describe this example in C,
> >>>>>>>> doesn't it?
> >>>>>>>
> >>>>>>> That is a matter of some debate, but it has gotten increasingly more
> >>>>>>> difficult to get C to do one's bidding over the decades.
> >>>>>>>
> >>>>>>>> What am I missing here?
> >>>>>>>
> >>>>>>> The compiler has to work harder in your example case, so it is probably
> >>>>>>> just a matter of time.  In the original with ">=", all the compiler needed
> >>>>>>> to do was look at all the assignments and the initial value.  In the
> >>>>>>> original, it also had to do reasoning based on control dependencies
> >>>>>>> (which are not yet part of the C standard).
> >>>>>>>
> >>>>>>> But yes, the degree to which a compiler can optimize atomics is a matter
> >>>>>>> of some debate.  Here is a proposal to let the developer choose:
> >>>>>>>
> >>>>>>
> >>>>>> Hi Paul,
> >>>>>>
> >>>>>> I know the compiler could optimize atomics in very interesting ways, but
> >>>>>> this case is about volatile, so I guess our case is still fine? ;-)
> >>>>>
> >>>>> Hello, Boqun,
> >>>>>
> >>>>> When I asked that question, the answer I got was "the compiler must
> >>>>> emit the load instruction, but is under no obligation to actually use the
> >>>>> value loaded".
> >>>>
> >>>> So, it sounds like the following description found in memory-barriers.txt
> >>>> just above the example of lack of transitivity:
> >>>>
> >>>>> However, stores are not speculated.  This means that ordering -is- provided
> >>>>> for load-store control dependencies, as in the following example:
> >>>>>
> >>>>> 	q = READ_ONCE(a);
> >>>>> 	if (q) {
> >>>>> 		WRITE_ONCE(b, 1);
> >>>>> 	}
> >>>>>
> >>>>> Control dependencies pair normally with other types of barriers.
> >>>>> That said, please note that neither READ_ONCE() nor WRITE_ONCE()
> >>>>> are optional! Without the READ_ONCE(), the compiler might combine the
> >>>>> load from 'a' with other loads from 'a'.  Without the WRITE_ONCE(),
> >>>>> the compiler might combine the store to 'b' with other stores to 'b'.
> >>>>> Either can result in highly counterintuitive effects on ordering.
> >>>>>
> >>>>> Worse yet, if the compiler is able to prove (say) that the value of
> >>>>> variable 'a' is always non-zero, it would be well within its rights
> >>>>> to optimize the original example by eliminating the "if" statement
> >>>>> as follows:
> >>>>>
> >>>>> 	q = a;
> >>>>> 	b = 1;  /* BUG: Compiler and CPU can both reorder!!! */
> >>>>>
> >>>>> So don't leave out the READ_ONCE().
> >>>>
> >>>> does not stand if the answer needs to be taken seriously.
> >>>> The READ_ONCE() is not good enough to prevent the "if" statement
> >>>> from being eliminated.
> >>>>
> >>>> Hmm... If we do need to care about such an extreme optimization,
> >>>> we should not rely on any control dependency in C source code
> >>>> at least until the compiler gets educated about control dependency.
> >>>>
> >>>> Is there some reasonable compromise?
> >>>
> >>> Right now, what we have are the volatile loads such as from READ_ONCE()
> >>> and WRITE_ONCE().  My defense of them has been that they need to properly
> >>> handle MMIO.  But not all compiler writers agree with me, so we should
> >>> program at least a bit defensively.  Though we should probably also
> >>> be writing tests to verify that the compiler is doing the right thing,
> >>> and those litmus tests should probably include your ">=" litmus test.
> >>> But we should not encourage developers to rely on it quite yet.
> >>>
> >>> My current position is that compiler writers are currently unlikely to
> >>> make their compilers do deep reasoning around the memory model, and that
> >>> they are currently unlikely to do cross-translation-unit assigned-value
> >>> analysis (though LTO might be changing that).  But given a static atomic
> >>> that is visible only within a translation unit, I would expect them
> >>> to do value-range analysis.  Hence my reluctance to present the ">="
> >>> variant as a pattern to follow.
> >>
> >> So if we declare "x" and "y" are not static (not file local),
> >> can the ">=" variant be exempted from the possible optimization?
> >>
> >> For example:
> >>
> >> 	extern int x, y;
> >>
> >> 	CPU 0                     CPU 1
> >> 	=======================   =======================
> >> 	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);
> >> 	if (r1 >= 0)              if (r2 >= 0)
> >> 	  WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);
> >>
> >> 	assert(!(r1 == 1 && r2 == 1));
> >>
> >> This looks safe to me.
> > 
> > For the compilers I know about at the present time, yes.
> 
> So if I respin the patch with the extern, would you still feel reluctant?

Yes, because I am not seeing how this change helps.  What is this telling
the reader that the original did not, and how does it help the reader
generate good concurrent code?

							Thanx, Paul

>              Regards, Akira
> 
> > 
> > The underlying problem is that the standard says almost nothing about what
> > volatile does.  I usually argue that it was intended to be used for MMIO,
> > so any optimization that would prevent a device driver from working must
> > be prohibited on volatiles.  A lot of people really don't like volatile,
> > and would like to eliminate it, and these people also aren't very happy
> > about any attempt to better define volatile.
> > 
> > Keeps things entertaining.  ;-)
> > 
> > 							Thanx, Paul
> > 
> >> Thoughts?
> >>
> >>             Thanks, Akira
> >>
> >>>
> >>> 							Thanx, Paul
> >>>
> >>>
> >>>>            Thanks, Akira
> >>>>
> >>>>>
> >>>>> I don't happen to like that answer, by the way.  ;-)
> >>>>>
> >>>>> 							Thanx, Paul
> >>>>>
> >>>>>>> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0062r1.html
> >>>>>>>
> >>>>>>
> >>>>>> Great material to wake up mind in the morning! Thanks.
> >>>>>>
> >>>>>> Regards,
> >>>>>> Boqun
> >>>>>>
> >>>>>>> What are your thoughts on this?
> >>>>>>>
> >>>>>>> 							Thanx, Paul
> >>>>>>>
> >>>>>>>>             Thanks, Akira
> >>>>>>>>
> >>>>>>>>> That said, I very much welcome critical reviews of memory-barriers.txt,
> >>>>>>>>> so please do feel free to continue doing that!
> >>>>>>>>>
> >>>>>>>>> 							Thanx, Paul
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>
> > 
> > 
>