linux-kernel - Re: [PATCH] documentation: Fix two-CPU control-dependency example

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20170723044300.GI3730@linux.vnet.ibm.com>
Date:   Sat, 22 Jul 2017 21:43:00 -0700
From:   "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:     Akira Yokosawa <akiyks@...il.com>
Cc:     Boqun Feng <boqun.feng@...il.com>, linux-doc@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] documentation: Fix two-CPU control-dependency example

On Sat, Jul 22, 2017 at 08:38:57AM +0900, Akira Yokosawa wrote:
> On 2017/07/20 16:07:14 -0700, Paul E. McKenney wrote:
> > On Fri, Jul 21, 2017 at 07:52:03AM +0900, Akira Yokosawa wrote:
> >> On 2017/07/20 14:42:34 -0700, Paul E. McKenney wrote:
> [...]
> >>> For the compilers I know about at the present time, yes.
> >>
> >> So if I respin the patch with the extern, would you still feel reluctant?
> > 
> > Yes, because I am not seeing how this change helps.  What is this telling
> > the reader that the original did not, and how does it help the reader
> > generate good concurrent code?
> 
> Well, what bothers me in the ">" version of two-CPU example is the
> explanation in memory-barriers.txt that follows:
> 
> > These two examples are the LB and WWC litmus tests from this paper:
> > http://www.cl.cam.ac.uk/users/pes20/ppc-supplemental/test6.pdf and this
> > site: https://www.cl.cam.ac.uk/~pes20/ppcmem/index.html.
> 
> I'm wondering if calling the ">" version as an "LB" litmus test is correct.
> Because it always results in "r1 == 0 && r2 == 0", 100%.

As it should, because nothing can become non-zero unless something was
already non-zero.  It is possible to create similarly single-outcome
tests with address dependencies.

But yes, converting to ">=" makes the stores unconditional, and thus
allows more outcomes.  Perhaps we need both?  Another approach is to
write a second value in an "else" statement, keeping the original ">".

I agree that some of these examples can be a bit hard on one's intuition,
but the plain fact is that when it comes to memory models, one's intuition
is not always one's friend.

> An LB litmus test with full memory barriers would be:
> 
> 	CPU 0                     CPU 1
> 	=======================   =======================
> 	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);
> 	smp_mb();                 smp_mb();
> 	WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);
> 
> 	assert(!(r1 == 1 && r2 == 1));
> 
> and this will result in either of
> 
>         r1 == 0 && r2 == 0
>         r1 == 0 && r2 == 1
>         r1 == 1 && r2 == 0
> 
> but never "r1 == 1 && r2 == 1".

Agreed, because unlike the control-dependency example, the WRITE_ONCE()s
happen unconditionally.

> The difference in the behavior distracts me in reading this part
> of memory-barriers.txt.

Then it likely needs more explanation.

> Your priority seemed to be in reducing the chance of the "if" statement
> to be optimized away.  So I suggested to use "extern" as a compromise.

If the various tools accept the "extern", this might not be a bad thing
to do.

But what this really means is that I need to take another tilt at
the "volatile" windmill in the committee.

> Another way would be to express the ">=" version in a pseudo-asm form.
> 
> 	CPU 0                     CPU 1
> 	=======================   =======================
> 	r1 = LOAD x               r2 = LOAD y
> 	if (r1 >= 0)              if (r2 >= 0)
> 	  STORE y = 1               STORE x = 1
> 
> 	assert(!(r1 == 1 && r2 == 1));
> 
> This should eliminate any concern of compiler optimization.
> In this final part of CONTROL DEPENDENCIES section, separating the
> problem of optimization and transitivity would clarify the point
> (at least for me).

The problem is that people really do use C-language control dependencies
in the Linux kernel, so we need to describe them.  Maybe someday it
will be necessary to convert them to asm, but I am hoping that we can
avoid that.

> Thoughts?

My hope is that the memory model can help here, but that will in any
case take time.

							Thanx, Paul

>             Regards, Akira
> 
> > 
> > 							Thanx, Paul
> > 
> >>              Regards, Akira
> >>
> >>>
> >>> The underlying problem is that the standard says almost nothing about what
> >>> volatile does.  I usually argue that it was intended to be used for MMIO,
> >>> so any optimization that would prevent a device driver from working must
> >>> be prohibited on volatiles.  A lot of people really don't like volatile,
> >>> and would like to eliminate it, and these people also aren't very happy
> >>> about any attempt to better define volatile.
> >>>
> >>> Keeps things entertaining.  ;-)
> >>>
> >>> 							Thanx, Paul
> >>>
>