lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <55B66ACB.6010702@hp.com>
Date:	Mon, 27 Jul 2015 13:30:51 -0400
From:	Waiman Long <waiman.long@...com>
To:	Davidlohr Bueso <dave@...olabs.net>
CC:	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	"H. Peter Anvin" <hpa@...or.com>, x86@...nel.org,
	linux-kernel@...r.kernel.org, Scott J Norton <scott.norton@...com>,
	Douglas Hatch <doug.hatch@...com>
Subject: Re: [PATCH v3 2/7] locking/pvqspinlock: Add pending bit support

On 07/26/2015 08:56 PM, Davidlohr Bueso wrote:
> On Wed, 2015-07-22 at 16:12 -0400, Waiman Long wrote:
>> Like the native qspinlock, using the pending bit when it is lightly
>> loaded to acquire the lock is faster than going through the PV queuing
>> process which is even slower than the native queuing process. It also
>> avoids loading two additional cachelines (the MCS and PV nodes).
>>
>> This patch adds the pending bit support for PV qspinlock. The pending
>> bit code has a smaller spin threshold (1<<10). It will default back
>> to the queuing method if it cannot acquired the lock within a certain
>> time limit.
> Can we infer that this new spin threshold is the metric to detect these
> "light loads"? If so, I cannot help but wonder if there is some more
> straightforward/ad-hoc way of detecting this, ie some pv_<>  function.
> That would also save a lot of time as it would not be time based.
> Although it might be a more costly call altogether, I dunno.

I used the term "light load" to refer to the condition that at most 2 
competing threads are trying to acquire the lock. In that case, the 
pending code will be used. Once there are 3 or more competing threads, 
it will switch back to the regular queuing code. It is the same 
mechanism used in the native code. The only difference is in the 
addition of a loop counter to make sure that the thread won't spend too 
much time on spinning.

> Some comments about this 'loop' threshold.
>
>> +static int pv_pending_lock(struct qspinlock *lock, u32 val)
>> +{
>> +	int loop = PENDING_SPIN_THRESHOLD;
>> +	u32 new, old;
>> +
>> +	/*
>> +	 * wait for in-progress pending->locked hand-overs
>> +	 */
>> +	if (val == _Q_PENDING_VAL) {
>> +		while (((val = atomic_read(&lock->val)) == _Q_PENDING_VAL)&&
>> +			loop--)
>> +			cpu_relax();
>> +	}
>> +
>> +	/*
>> +	 * trylock || pending
>> +	 */
>> +	for (;;) {
>> +		if (val&  ~_Q_LOCKED_MASK)
>> +			goto queue;
>> +		new = _Q_LOCKED_VAL;
>> +		if (val == new)
>> +			new |= _Q_PENDING_VAL;
>> +		old = atomic_cmpxchg(&lock->val, val, new);
>> +		if (old == val)
>> +			break;
>> +		if (loop--<= 0)
>> +			goto queue;
>> +	}
> So I'm not clear about the semantics of what (should) occurs when the
> threshold is exhausted. In the trylock/pending loop above, you
> immediately return 0, indicating we want to queue. Ok, but below:

This is in the lock slowpath, so it can't return a lock failure.

>> +
>> +	if (new == _Q_LOCKED_VAL)
>> +		goto gotlock;
>> +	/*
>> +	 * We are pending, wait for the owner to go away.
>> +	 */
>> +	while (((val = smp_load_acquire(&lock->val.counter))&  _Q_LOCKED_MASK)
>> +		&&  (loop-->  0))
>> +		cpu_relax();
>> +
>> +	if (!(val&  _Q_LOCKED_MASK)) {
>> +		clear_pending_set_locked(lock);
>> +		goto gotlock;
>> +	}
>> +	/*
>> +	 * Clear the pending bit and fall back to queuing
>> +	 */
>> +	clear_pending(lock);
> ... you call clear_pending before returning. Is this intentional? Smells
> fishy.

The pending bit acts as a 1-slot waiting queue. So if the vCPU needs to 
fall back to regular queuing, it needs to clear the bit.

>
> And basically afaict all this chunk of code does is spin until loop is
> exhausted, and breakout when we got the lock. Ie, something like this is
> a lot cleaner:
>
>                  while (loop--) {
>                  	/*
>                           * We are pending, wait for the owner to go away.
>                           */
>                  	val = smp_load_acquire(&lock->val.counter);
>                  	if (!(val&  _Q_LOCKED_MASK)) {
>                  		clear_pending_set_locked(lock);
>                  		goto gotlock;
>                  	}
>
>                  	cpu_relax();		
>                  }
>
>                  /*
>                   * Clear the pending bit and fall back to queuing
>                   */
>                  clear_pending(lock);
>

Yes, we could change the loop to that. I was just following the same 
logic in the native code.

Cheers,
Longman

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ