linux-kernel - Re: [PATCH v15 09/15] pvqspinlock: Implement simple paravirt support for the qspinlock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <552BE608.7000901@hp.com>
Date:	Mon, 13 Apr 2015 11:51:36 -0400
From:	Waiman Long <waiman.long@...com>
To:	Peter Zijlstra <peterz@...radead.org>
CC:	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>,
	"H. Peter Anvin" <hpa@...or.com>, linux-arch@...r.kernel.org,
	x86@...nel.org, linux-kernel@...r.kernel.org,
	virtualization@...ts.linux-foundation.org,
	xen-devel@...ts.xenproject.org, kvm@...r.kernel.org,
	Paolo Bonzini <paolo.bonzini@...il.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
	Boris Ostrovsky <boris.ostrovsky@...cle.com>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Rik van Riel <riel@...hat.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Raghavendra K T <raghavendra.kt@...ux.vnet.ibm.com>,
	David Vrabel <david.vrabel@...rix.com>,
	Oleg Nesterov <oleg@...hat.com>,
	Daniel J Blueman <daniel@...ascale.com>,
	Scott J Norton <scott.norton@...com>,
	Douglas Hatch <doug.hatch@...com>
Subject: Re: [PATCH v15 09/15] pvqspinlock: Implement simple paravirt support
 for the qspinlock

On 04/13/2015 11:08 AM, Peter Zijlstra wrote:
> On Thu, Apr 09, 2015 at 05:41:44PM -0400, Waiman Long wrote:
>
>>>> +static void pv_wait_head(struct qspinlock *lock, struct mcs_spinlock *node)
>>>> +{
>>>> +	struct __qspinlock *l = (void *)lock;
>>>> +	struct qspinlock **lp = NULL;
>>>> +	struct pv_node *pn = (struct pv_node *)node;
>>>> +	int slow_set = false;
>>>> +	int loop;
>>>> +
>>>> +	for (;;) {
>>>> +		for (loop = SPIN_THRESHOLD; loop; loop--) {
>>>> +			if (!READ_ONCE(l->locked))
>>>> +				return;
>>>> +
>>>> +			cpu_relax();
>>>> +		}
>>>> +
>>>> +		WRITE_ONCE(pn->state, vcpu_halted);
>>>> +		if (!lp)
>>>> +			lp = pv_hash(lock, pn);
>>>> +		/*
>>>> +		 * lp must be set before setting _Q_SLOW_VAL
>>>> +		 *
>>>> +		 * [S] lp = lock                [RmW] l = l->locked = 0
>>>> +		 *     MB                             MB
>>>> +		 * [S] l->locked = _Q_SLOW_VAL  [L]   lp
>>>> +		 *
>>>> +		 * Matches the cmpxchg() in pv_queue_spin_unlock().
>>>> +		 */
>>>> +		if (!slow_set&&
>>>> +		    !cmpxchg(&l->locked, _Q_LOCKED_VAL, _Q_SLOW_VAL)) {
>>>> +			/*
>>>> +			 * The lock is free and _Q_SLOW_VAL has never been
>>>> +			 * set. Need to clear the hash bucket before getting
>>>> +			 * the lock.
>>>> +			 */
>>>> +			WRITE_ONCE(*lp, NULL);
>>>> +			return;
>>>> +		} else if (slow_set&&   !READ_ONCE(l->locked))
>>>> +			return;
>>>> +		slow_set = true;
>>> I'm somewhat puzzled by the slow_set thing; what is wrong with the thing
>>> I had, namely:
>>>
>>> 		if (!lp) {
>>> 			lp = pv_hash(lock, pn);
>>>
>>> 			/*
>>> 			 * comment
>>> 			 */
>>> 			lv = cmpxchg(&l->locked, _Q_LOCKED_VAL, _Q_SLOW_VAL);
>>> 			if (lv != _Q_LOCKED_VAL) {
>>> 				/* we're woken, unhash and return */
>>> 				WRITE_ONCE(*lp, NULL);
>>> 				return;
>>> 			}
>>> 		}
>>>> +
>>>> +		pv_wait(&l->locked, _Q_SLOW_VAL);
>>> If we get a spurious wakeup (due to device interrupts or random kick)
>>> we'll loop around but ->locked will remain _Q_SLOW_VAL.
>> The purpose of the slow_set flag is not about the lock value. It is to make
>> sure that pv_hash_find() will always find a match. Consider the following
>> scenario:
>>
>> cpu1            cpu2                    cpu3
>> ----            ----                    ----
>> pv_wait
>> spurious wakeup
>> loop l->locked
>>
>>                  read _Q_SLOW_VAL
>>                  pv_hash_find()
>>                  unlock
>>
>>                                          pv_hash()<- same entry
>>
>> cmpxchg(&l->locked)
>> clear hash (?)
>>
>> Here, the entry for cpu3 will be removed leading to panic when
>> pv_hash_find() can find the entry. So the hash entry can only be
>> removed if the other cpu has no chance to see _Q_SLOW_VAL.
> Still confused. Afaict that cannot happen with my code. We only do the
> cmpxchg(, _Q_SLOW_VAL) _once_.
>
> Only on the first time around that loop will we hash the lock and set
> the slow flag. And cpu3 cannot come in on the same entry because we've
> not yet released the lock when we find and unhash.
>
>

Maybe I am not clear in my illustration. More than one locks can be 
hashed to the same value. So cpu3 is accessing a different lock which 
has the same hashed value as the lock used by cpu1 and cpu2.

Anyway, I remove the slow_set flag by unrolling the retry loop so that 
after pv_wait(), it goes into the 2nd loop instead of going back to the 
top. As a result, cmpxchg and pv_hash can only be called once.

Cheers,
Longman
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/