linux-kernel - Re: [PATCH 1/4] spinlock: Document memory barrier rules

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <3f7c39e5-4c46-0641-d29e-36c9439ad6dc@colorfullife.com>
Date:   Thu, 1 Sep 2016 13:04:26 +0200
From:   Manfred Spraul <manfred@...orfullife.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Will Deacon <will.deacon@....com>, benh@...nel.crashing.org,
        paulmck@...ux.vnet.ibm.com, Ingo Molnar <mingo@...e.hu>,
        Boqun Feng <boqun.feng@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        LKML <linux-kernel@...r.kernel.org>, 1vier1@....de,
        Davidlohr Bueso <dave@...olabs.net>
Subject: Re: [PATCH 1/4] spinlock: Document memory barrier rules

Hi,

On 09/01/2016 10:44 AM, Peter Zijlstra wrote:
> On Wed, Aug 31, 2016 at 08:32:18PM +0200, Manfred Spraul wrote:
>> On 08/31/2016 06:40 PM, Will Deacon wrote:
>>> The litmus test then looks a bit like:
>>>
>>> CPUm:
>>>
>>> LOCK(x)
>>> smp_mb();
>>> RyAcq=0
>>>
>>>
>>> CPUn:
>>>
>>> Wy=1
>>> smp_mb();
>>> UNLOCK_WAIT(x)
>> Correct.
>>> which I think can be simplified to:
>>>
>>>
>>> LOCK(x)
>> I thought that here a barrier is required, because Ry=0 could be before
>> store of the lock.
>>> Ry=0
>> RyAcq instead of Ry would required due to the unlock at the end of the
>> critical section
>> CpuN: <...>
>>            WyRelease=0
>> for the litmus test irrelevant.
>>> Wy=1
>>> smp_mb(); // Note that this is implied by spin_unlock_wait on PPC and arm64
>>> LOCK(x)   // spin_unlock_wait behaves like lock; unlock
>>> UNLOCK(x)
>>> [I've removed a bunch of barriers here, that I don't think are necessary
>>>   for the guarantees you're after]
>>>
>>> and the question is "Can both CPUs proceed?".
>>>
>>> Looking at the above, then I don't think that they can. Whilst CPUm can
>>> indeed speculate the Ry=0 before successfully taking the lock, if CPUn
>>> observes CPUm's read, then it must also observe the lock being held wrt
>>> the spin_lock API. That is because a successful LOCK operation by CPUn
>>> would force CPUm to replay its LL/SC loop and therefore discard its
>>> speculation of y.
>>>
>>> What am I missing? The code snippet seems to have too many barriers to me!
>> spin_unlock_wait() is not necessarily lock()+unlock().
>> It can be a simple Rx, or now RxAcq.
> Can be, normally, yes. But on power and arm64, the only architectures on
> which the ACQUIRE is 'funny' they do the 'pointless' ll/sc cycle in
> spin_unlock_wait() to 'fix' things.
>
> So for both power and arm64, you can in fact model spin_unlock_wait()
> as LOCK+UNLOCK.
Is this consensus?

If I understand it right, the rules are:
1. spin_unlock_wait() must behave like spin_lock();spin_unlock();
2. spin_is_locked() must behave like spin_trylock() ? spin_unlock(),TRUE 
: FALSE
3. the ACQUIRE during spin_lock applies to the lock load, not to the store.

sem.c and nf_conntrack.c need only rule 1 now, but I would document the 
rest as well, ok?

I'll update the patches.

--
     Manfred