linux-kernel - Re: [PATCH v2] locking/pvqspinlock: Relax cmpxchg's to improve performance on some archs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <0124e786-3b11-e75d-7e95-d7988267d7d9@redhat.com>
Date:   Tue, 3 Jan 2017 17:23:08 -0500
From:   Waiman Long <longman@...hat.com>
To:     Boqun Feng <boqun.feng@...il.com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>, linux-kernel@...r.kernel.org,
        Pan Xinhui <xinhui@...ux.vnet.ibm.com>
Subject: Re: [PATCH v2] locking/pvqspinlock: Relax cmpxchg's to improve
 performance on some archs

On 12/26/2016 12:50 AM, Boqun Feng wrote:
> Hi Wainman,
>
> On Sun, Dec 25, 2016 at 03:26:01PM -0500, Waiman Long wrote:
>> A number of cmpxchg calls in qspinlock_paravirt.h were replaced by more
>> relaxed versions to improve performance on architectures that use LL/SC.
>>
>> All the locking related cmpxchg's are replaced with the _acquire
>> variants:
>>  - pv_queued_spin_steal_lock()
>>  - trylock_clear_pending()
>>
>> The cmpxchg's related to hashing are replaced by either by the _release
>> or the _relaxed variants. See the inline comment for details.
>>
>> Signed-off-by: Waiman Long <longman@...hat.com>
>>
>>  v1->v2:
>>   - Add comments in changelog and code for the rationale of the change.
>>
>> ---
>>  kernel/locking/qspinlock_paravirt.h | 50 ++++++++++++++++++++++++-------------
>>  1 file changed, 33 insertions(+), 17 deletions(-)
>>
>> diff --git a/kernel/locking/qspinlock_paravirt.h b/kernel/locking/qspinlock_paravirt.h
>> index e3b5520..c31d1ab 100644
>> --- a/kernel/locking/qspinlock_paravirt.h
>> +++ b/kernel/locking/qspinlock_paravirt.h
>> @@ -72,7 +72,7 @@ static inline bool pv_queued_spin_steal_lock(struct qspinlock *lock)
>>  	struct __qspinlock *l = (void *)lock;
>>  
>>  	if (!(atomic_read(&lock->val) & _Q_LOCKED_PENDING_MASK) &&
>> -	    (cmpxchg(&l->locked, 0, _Q_LOCKED_VAL) == 0)) {
>> +	    (cmpxchg_acquire(&l->locked, 0, _Q_LOCKED_VAL) == 0)) {
>>  		qstat_inc(qstat_pv_lock_stealing, true);
>>  		return true;
>>  	}
>> @@ -101,16 +101,16 @@ static __always_inline void clear_pending(struct qspinlock *lock)
>>  
>>  /*
>>   * The pending bit check in pv_queued_spin_steal_lock() isn't a memory
>> - * barrier. Therefore, an atomic cmpxchg() is used to acquire the lock
>> - * just to be sure that it will get it.
>> + * barrier. Therefore, an atomic cmpxchg_acquire() is used to acquire the
>> + * lock to provide the proper memory barrier.
>>   */
>>  static __always_inline int trylock_clear_pending(struct qspinlock *lock)
>>  {
>>  	struct __qspinlock *l = (void *)lock;
>>  
>>  	return !READ_ONCE(l->locked) &&
>> -	       (cmpxchg(&l->locked_pending, _Q_PENDING_VAL, _Q_LOCKED_VAL)
>> -			== _Q_PENDING_VAL);
>> +	       (cmpxchg_acquire(&l->locked_pending, _Q_PENDING_VAL,
>> +				_Q_LOCKED_VAL) == _Q_PENDING_VAL);
>>  }
>>  #else /* _Q_PENDING_BITS == 8 */
>>  static __always_inline void set_pending(struct qspinlock *lock)
>> @@ -138,7 +138,7 @@ static __always_inline int trylock_clear_pending(struct qspinlock *lock)
>>  		 */
>>  		old = val;
>>  		new = (val & ~_Q_PENDING_MASK) | _Q_LOCKED_VAL;
>> -		val = atomic_cmpxchg(&lock->val, old, new);
>> +		val = atomic_cmpxchg_acquire(&lock->val, old, new);
>>  
>>  		if (val == old)
>>  			return 1;
>> @@ -209,9 +209,15 @@ static struct qspinlock **pv_hash(struct qspinlock *lock, struct pv_node *node)
>>  	struct pv_hash_entry *he;
>>  	int hopcnt = 0;
>>  
>> +	/*
>> +	 * Synchronizing with the node state variable will control who does
>> +	 * the hashing - the lock holder or lock waiter. The control
>> +	 * dependency will ensure that node value is written after the lock
>> +	 * value. So we don't need other ordering guarantee.
>> +	 */
> By this comment, you mean that
> 	
> 	cmpxchg_relaxed(&he->lock, NULL, lock);
> 	  r1 = ll he->lock;
> 	  <compare part>
> 	  sc he->lock, lock // successed
>
> 	if (r1)
> 		WRITE_ONCE(he->node, node);
>
>
> the sc and WRITE_ONCE() can not be reordered because of the control
> dependency? I dont think this is true. Yes the sc must execute before
> the WRITE_ONCE(), but the memory/cache effects may be reordered. IOW,
> the following may happen
>
>
> 	CPU 0			CPU 1
> 	===================	=======================
> 	{x = 0, y = 0}		if (!cmpxchg_relaxed(&y, 0, 1))
> 					WRITE_ONCE(x, 1);
> 	r1 = READ_ONCE(x);
>
> 	smp_rmb();
>
> 	r2 = READ_ONCE(y);
>
> The following result is possible:
>
> 	y = 1 && r1 = 1 && r2 = 0
>
> Or I'm missing your point here? ;-) 
>
> Regards,
> Boqun
>
You are probably right. I know the code is somewhat risky. That is why I
am waiting for expert like you to see if this is really the case. Now it
seems that it may not be the case. I will revise the patch to take that out.

Cheers,
Longman