lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <26923846-94c1-f08d-76aa-271f27bc83a3@nvidia.com>
Date:   Thu, 5 Jul 2018 08:35:30 -0700
From:   Daniel Lustig <dlustig@...dia.com>
To:     Will Deacon <will.deacon@....com>,
        Andrea Parri <andrea.parri@...rulasolutions.com>
CC:     Alan Stern <stern@...land.harvard.edu>,
        LKMM Maintainers -- Akira Yokosawa <akiyks@...il.com>,
        Boqun Feng <boqun.feng@...il.com>,
        David Howells <dhowells@...hat.com>,
        Jade Alglave <j.alglave@....ac.uk>,
        Luc Maranget <luc.maranget@...ia.fr>,
        Nicholas Piggin <npiggin@...il.com>,
        "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Kernel development list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 2/2] tools/memory-model: Add write ordering by
 release-acquire and by locks

On 7/5/2018 8:16 AM, Daniel Lustig wrote:
> On 7/5/2018 7:44 AM, Will Deacon wrote:
>> Andrea,
>>
>> On Thu, Jul 05, 2018 at 04:00:29PM +0200, Andrea Parri wrote:
>>> On Wed, Jul 04, 2018 at 01:11:04PM +0100, Will Deacon wrote:
>>>> On Tue, Jul 03, 2018 at 01:28:17PM -0400, Alan Stern wrote:
>>>>> There's also read-write ordering, in the form of the LB pattern:
>>>>>
>>>>> P0(int *x, int *y, int *z)
>>>>> {
>>>>> 	r0 = READ_ONCE(*x);
>>>>> 	smp_store_release(z, 1);
>>>>> 	r1 = smp_load_acquire(z);
>>>>> 	WRITE_ONCE(*y, 1);
>>>>> }
>>>>>
>>>>> P1(int *x, int *y)
>>>>> {
>>>>> 	r2 = READ_ONCE(*y);
>>>>> 	smp_mp();
>>>>> 	WRITE_ONCE(*x, 1);
>>>>> }
>>>>>
>>>>> exists (0:r0=1 /\ 1:r2=1)
>>>>
>>>> The access types are irrelevant to the acquire/release primitives, so yes
>>>> that's also allowed.
>>>>
>>>>> Would this be allowed if smp_load_acquire() was implemented with LDAPR?
>>>>> If the answer is yes then we will have to remove the rfi-rel-acq and
>>>>> rel-rf-acq-po relations from the memory model entirely.
>>>>
>>>> I don't understand what you mean by "rfi-rel-acq-po", and I assume you mean
>>>> rel-rfi-acq-po for the other? Sounds like I'm confused here.
>>>
>>> [Your reply about 1/2 suggests that you've figured this out now, IAC ...]
>>
>> Yeah, the naming threw me because it's not the same order as the composition
>> in the actual definition (why not?). I typoed an extra 'po' suffix. Well
>> done for spotting it.
>>
>>> "rfi-rel-acq" (as Alan wrote, and as I wrote before my question above...)
>>> is defined and currently used in linux-kernel.cat (and, FWIW, it has been
>>> so since when we upstreamed LKMM).
>>>
>>> My point is that, as exemplified by the two tests I reported above, this
>>> relation already prevents you from implementing your acquire with LDAPR;
>>> so my/our question was not "can you run herd7 for me?" but rather "do you
>>> think that changes are needed to the .cat file?".
>>
>> I mean, you can run herd on the armv8 memory model and see the answer to the
>> litmus test. So we have two options for the LKMM .cat file (the Arm one is
>> baked in silicon):
>>
>>   1. Say that architectures with RCpc acquire/release instructions must
>>      instead use RCsc instructions (if they have them) or full fences
>>
>>   2. Change the LKMM .cat file to allow RCpc acquire/release (note that I'd
>>      still like RCsc lock/unlock semantics, and I think that's actually the
>>      real case that matters here, but the currently difficulty in
>>      distinguishing the two in the .cat file has led to this broader
>>      ordering being enforced as a side-effect)
>>
>> I prefer (2), because I think it's a safe and sensible relaxation to make
>> and accomodates what I consider to be a likely extension to weakly ordered
>> architectures that we might want to support efficiently. So yes, I think
>> changes are needed to the LKMM .cat file, but please don't view that as a
>> criticism. We change stuff all the time as long as it doesn't break userspace.
>>
>>> This question goes back _at least_ to:
>>>
>>>   http://lkml.kernel.org/r/1519301990-11766-1-git-send-email-parri.andrea@gmail.com
>>>
>>> (see, in part., the "IMPORTANT" note at the bottom of the commit message)
>>> and that discussion later resulted in:
>>>
>>>   0123f4d76ca63b ("riscv/spinlock: Strengthen implementations with fences")
>>>   5ce6c1f3535fa8 ("riscv/atomic: Strengthen implementations with fences")
>>>
>>> (the latest _draft_ of the RISC-V specification, as pointed out by Daniel,
>>>
>>>   https://github.com/riscv/riscv-isa-manual, Appendix A.5
>>>
>>>  includes our "Linux mapping", although the currently-recommended mapping
>>>  differs and involves a "fence.tso [+ any acquire, including RCpc .aq]").
>>
>> [I think the discussion at hand is broader than RISC-V, and I looped in
>>  Daniel already]
> 
> Sorry for the delay, it was Independence Day here in the US.
> 
> I'm aligned with Will on all this so far.  As mentioned above, this issue
> definitely comes up on RISC-V, but for reasons even beyond RCpc vs. RCsc.
> Why?  On RISC-V, we have RCsc atomics, but no native load-acquire or
> store-release opcodes.  You might expect you can use "ld; fence r,rw"
> and "fence rw,w; sd", respectively, but those don't interact with the
> RCsc .aq/.rl mappings.  For example, suppose we did this:
> 
> (a)
> amoswap.w.rl x0, x0, 0(a1)   // st-rel using amoswap with dummy dest
> ...
> lw a0, 0(a1)                 // ld-acq using fence-based mapping
> fence r,rw
> (b)
> 
> Nothing here orders (a) before (b) in general.  A mapping always using

And here, I should have said that (a) and (b) are stores.  But then
flip it around, so that the release uses fences and the acquire uses
RCsc, and then there's a similar lack of read->read ordering between
(a) and (b).  That's what I really meant to say here.  Depending on
which particular mapping you use in each case, you may or may not get
R->R or W->W ordering.

> fences would cover what's being asked for (I believe), and using only
> RCsc operations (like ARM does) would also work if we had native
> load-acquire and store-release operations, but this mixed mapping
> doesn't.  That's why the spec currently recommends the stronger
> fence.tso (~= multi-copy atomic Power lwsync) instead.
> 
> ...but it's very non-obvious.  Is there really a hurry to rush all
> this in?
> 
>>> My understanding is that your answer to this question is "Yes", but I am
>>> not sure about the "How/Which changes?"; of course, an answer his question
>>> _in the form_ of PATCHes would be appreciated! (but please also consider
>>> that I'll be offline for most of the time until next Monday.)
>>
>> C'mon, I'm reviewing patches here. The onus shouldn't be on the reviewer to
>> come back with the correct version of the patch. I'm also not terribly
>> worried if LKMM says the wrong thing whilst we work out what the right
>> thing should be, but I /would/ be worried if we started adding full fences to
>> an architecture that has acquire/release instructions just because they're
>> RCpc. If it turns out that no other arch maintainers care, then fine,
>> because frankly this doesn't affect me, but so far it's basically been
>> silence and I'd really like some more input before we close the door on
>> them
> 
> I care!  Thanks for speaking up Will!
> 
> Dan
> 
>>
>> Will
>>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ