linux-kernel - Re: [RFC][PATCH 0/3] locking/qspinlock: Improve determinism for x86

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5c492b85-365f-e64d-2230-f479903591d8@redhat.com>
Date:   Wed, 26 Sep 2018 19:21:49 -0400
From:   Waiman Long <longman@...hat.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     will.deacon@....com, mingo@...nel.org,
        linux-kernel@...r.kernel.org, andrea.parri@...rulasolutions.com,
        tglx@...utronix.de
Subject: Re: [RFC][PATCH 0/3] locking/qspinlock: Improve determinism for x86

On 09/26/2018 01:51 PM, Peter Zijlstra wrote:
> On Wed, Sep 26, 2018 at 12:20:09PM -0400, Waiman Long wrote:
>> On 09/26/2018 07:01 AM, Peter Zijlstra wrote:
>>> Back when Will did his qspinlock determinism patches, we were left with one
>>> cmpxchg loop on x86 due to the use of atomic_fetch_or(). Will proposed a nifty
>>> trick:
>>>
>>>   http://lkml.kernel.org/r/20180409145409.GA9661@arm.com
>>>
>>> But at the time we didn't pursue it. This series implements that and argues for
>>> its correctness. In particular it places an smp_mb__after_atomic() in
>>> between the two operations, which forces the load to come after the
>>> store (which is free on x86 anyway).
>>>
>>> In particular this ordering ensures a concurrent unlock cannot trigger
>>> the uncontended handoff. Also it ensures that if the xchg() happens
>>> after a (successful) trylock, we must observe that LOCKED bit.
>> When you said "concurrent unlock cannot trigger the uncontended
>> handoff", are you saying the current code has this uncontended handoff
>> problem or just when comparing to doing a load first followed by xchg()?
> In particular I'm talking about this:
>
> locked:
> 	/*
> 	 * claim the lock:
> 	 *
> 	 * n,0,0 -> 0,0,1 : lock, uncontended
> 	 * *,0,0 -> *,0,1 : lock, contended
> 	 *
> 	 * If the queue head is the only one in the queue (lock value == tail)
> 	 * and nobody is pending, clear the tail code and grab the lock.
> 	 * Otherwise, we only need to grab the lock.
> 	 */
> 	/*
> 	 * In the PV case we might already have _Q_LOCKED_VAL set, because
> 	 * of lock stealing; therefore we must also allow:
> 	 *
> 	 * n,0,1 -> 0,0,1
> 	 *
> 	 * Note: at this point: (val & _Q_PENDING_MASK) == 0, because of the
> 	 *       above wait condition, therefore any concurrent setting of
> 	 *       PENDING will make the uncontended transition fail.
> 	 */
> 	if ((val & _Q_TAIL_MASK) == tail) {
> 		if (atomic_try_cmpxchg_relaxed(&lock->val, &val, _Q_LOCKED_VAL))
> 			goto release; /* No contention */
> 	}
>
>
> Here queue head acquires the lock and we take the "uncontended" path. If
> we set PENDING prior to this, the cmpxchg above will fail, and the later
> load will observe the tail.

Thanks for the clarification. I was wondering if you are talking about a
latent bug in the code. So this is not the case here.

Cheers,
Longman