linux-kernel - Re: [PATCH] documentation: Fix two-CPU control-dependency example

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <6ce659de-6c92-dbd8-e1dd-90f3e85521c0@gmail.com>
Date:   Sat, 22 Jul 2017 08:38:57 +0900
From:   Akira Yokosawa <akiyks@...il.com>
To:     "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Cc:     Boqun Feng <boqun.feng@...il.com>, linux-doc@...r.kernel.org,
        linux-kernel@...r.kernel.org, Akira Yokosawa <akiyks@...il.com>
Subject: Re: [PATCH] documentation: Fix two-CPU control-dependency example

On 2017/07/20 16:07:14 -0700, Paul E. McKenney wrote:
> On Fri, Jul 21, 2017 at 07:52:03AM +0900, Akira Yokosawa wrote:
>> On 2017/07/20 14:42:34 -0700, Paul E. McKenney wrote:
[...]
>>> For the compilers I know about at the present time, yes.
>>
>> So if I respin the patch with the extern, would you still feel reluctant?
> 
> Yes, because I am not seeing how this change helps.  What is this telling
> the reader that the original did not, and how does it help the reader
> generate good concurrent code?

Well, what bothers me in the ">" version of two-CPU example is the
explanation in memory-barriers.txt that follows:

> These two examples are the LB and WWC litmus tests from this paper:
> http://www.cl.cam.ac.uk/users/pes20/ppc-supplemental/test6.pdf and this
> site: https://www.cl.cam.ac.uk/~pes20/ppcmem/index.html.

I'm wondering if calling the ">" version as an "LB" litmus test is correct.
Because it always results in "r1 == 0 && r2 == 0", 100%.

An LB litmus test with full memory barriers would be:

	CPU 0                     CPU 1
	=======================   =======================
	r1 = READ_ONCE(x);        r2 = READ_ONCE(y);
	smp_mb();                 smp_mb();
	WRITE_ONCE(y, 1);         WRITE_ONCE(x, 1);

	assert(!(r1 == 1 && r2 == 1));

and this will result in either of

        r1 == 0 && r2 == 0
        r1 == 0 && r2 == 1
        r1 == 1 && r2 == 0

but never "r1 == 1 && r2 == 1".

The difference in the behavior distracts me in reading this part
of memory-barriers.txt.

Your priority seemed to be in reducing the chance of the "if" statement
to be optimized away.  So I suggested to use "extern" as a compromise.

Another way would be to express the ">=" version in a pseudo-asm form.

	CPU 0                     CPU 1
	=======================   =======================
	r1 = LOAD x               r2 = LOAD y
	if (r1 >= 0)              if (r2 >= 0)
	  STORE y = 1               STORE x = 1

	assert(!(r1 == 1 && r2 == 1));

This should eliminate any concern of compiler optimization.
In this final part of CONTROL DEPENDENCIES section, separating the
problem of optimization and transitivity would clarify the point
(at least for me).

Thoughts?

            Regards, Akira

> 
> 							Thanx, Paul
> 
>>              Regards, Akira
>>
>>>
>>> The underlying problem is that the standard says almost nothing about what
>>> volatile does.  I usually argue that it was intended to be used for MMIO,
>>> so any optimization that would prevent a device driver from working must
>>> be prohibited on volatiles.  A lot of people really don't like volatile,
>>> and would like to eliminate it, and these people also aren't very happy
>>> about any attempt to better define volatile.
>>>
>>> Keeps things entertaining.  ;-)
>>>
>>> 							Thanx, Paul
>>>