lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <570D262B.3000005@hpe.com>
Date:	Tue, 12 Apr 2016 12:45:31 -0400
From:	Waiman Long <waiman.long@....com>
To:	Davidlohr Bueso <dave@...olabs.net>
CC:	Peter Zijlstra <peterz@...radead.org>,
	<linux-kernel@...r.kernel.org>, <will.deacon@....com>,
	<mingo@...hat.com>, <paulmck@...ux.vnet.ibm.com>,
	<boqun.feng@...il.com>, <torvalds@...ux-foundation.org>
Subject: Re: [RFC][PATCH 2/3] locking/qrwlock: Use smp_cond_load_acquire()

On 04/12/2016 12:58 AM, Davidlohr Bueso wrote:
> On Mon, 04 Apr 2016, Peter Zijlstra wrote:
>
>> Use smp_cond_load_acquire() to make better use of the hardware
>> assisted 'spin' wait on arm64.
>>
>> Arguably the second hunk is the more horrid abuse possible, but
>> avoids having to use cmpwait (see next patch) directly. Also, this
>> makes 'clever' (ab)use of the cond+rmb acquire to omit the acquire
>> from cmpxchg().
>>
>> Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
>> ---
>> kernel/locking/qrwlock.c |   18 ++++--------------
>> 1 file changed, 4 insertions(+), 14 deletions(-)
>>
>> --- a/kernel/locking/qrwlock.c
>> +++ b/kernel/locking/qrwlock.c
>> @@ -53,10 +53,7 @@ struct __qrwlock {
>> static __always_inline void
>> rspin_until_writer_unlock(struct qrwlock *lock, u32 cnts)
>> {
>> -    while ((cnts & _QW_WMASK) == _QW_LOCKED) {
>> -        cpu_relax_lowlatency();
>> -        cnts = atomic_read_acquire(&lock->cnts);
>> -    }
>> +    smp_cond_load_acquire(&lock->cnts.counter, (VAL & _QW_WMASK) != 
>> _QW_LOCKED);
>> }
>>
>> /**
>> @@ -109,8 +106,6 @@ EXPORT_SYMBOL(queued_read_lock_slowpath)
>>  */
>> void queued_write_lock_slowpath(struct qrwlock *lock)
>> {
>> -    u32 cnts;
>> -
>>     /* Put the writer into the wait queue */
>>     arch_spin_lock(&lock->wait_lock);
>>
>> @@ -134,15 +129,10 @@ void queued_write_lock_slowpath(struct q
>>     }
>>
>>     /* When no more readers, set the locked flag */
>> -    for (;;) {
>> -        cnts = atomic_read(&lock->cnts);
>> -        if ((cnts == _QW_WAITING) &&
>> -            (atomic_cmpxchg_acquire(&lock->cnts, _QW_WAITING,
>> -                        _QW_LOCKED) == _QW_WAITING))
>> -            break;
>> +    smp_cond_load_acquire(&lock->cnts.counter,
>> +        (VAL == _QW_WAITING) &&
>> +        atomic_cmpxchg_relaxed(&lock->cnts, _QW_WAITING, _QW_LOCKED) 
>> == _QW_WAITING);
>>
>> -        cpu_relax_lowlatency();
>
> You would need some variant for cpu_relax_lowlatency otherwise you'll 
> be hurting s390, no?
> fwiw back when I was looking at this, I recall thinking about possibly 
> introducing
> smp_cond_acquire_lowlatency but never got around to it.
>
> Thanks,
> Davidlohr

The qrwlock is currently only used on x86 architecture. We can also come 
back to revisit this issue when other architectures that need the 
lowlatency variants are going to use qrwlock.

Cheers,
Longman

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ