linux-kernel - Re: [RESEND PATCH v3] locking/pvqspinlock: Add lock holder CPU argument to pv

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <57227791.2020908@hpe.com>
Date:	Thu, 28 Apr 2016 16:50:25 -0400
From:	Waiman Long <waiman.long@....com>
To:	Boqun Feng <boqun.feng@...il.com>
CC:	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...hat.com>, <linux-kernel@...r.kernel.org>,
	Pan Xinhui <xinhui@...ux.vnet.ibm.com>,
	Scott J Norton <scott.norton@....com>,
	Douglas Hatch <doug.hatch@....com>
Subject: Re: [RESEND PATCH v3] locking/pvqspinlock: Add lock holder CPU argument
 to pv_wait()

On 04/28/2016 09:32 AM, Boqun Feng wrote:
> Hi Waiman,
>
> On Thu, Apr 21, 2016 at 11:02:05AM -0400, Waiman Long wrote:
>> Pan Xinhui was asking for a lock holder cpu argument in pv_wait()
>> to help the porting of pvqspinlock to PPC. The new argument will can
>> help hypervisor expediate the execution of the critical section by
>> the lock holder, if its vCPU isn't running, so that it can release
>> the lock sooner.
>>
>> The pv_wait() function is now extended to have a third argument
>> that holds the vCPU number of the lock holder, if this is
>> known. Architecture specific hypervisor code can then make use of
>> that information to make better decision of which vCPU should be
>> running next.
>>
>> This patch also adds a new field in the pv_node structure to hold
>> the vCPU number of the previous queue entry. For the queue head,
>> the prev_cpu entry is likely to be the that of the lock holder,
>> though lock stealing may render this information inaccurate.
>>
>> In pv_wait_head_or_lock(), pv_wait() will now be called with that
>> vCPU number as it is likely to be the lock holder.
>>
>> In pv_wait_node(), the newly added pv_lookup_hash() function will
>> be called to look up the queue head and pass in the lock holder vCPU
>> number stored there, if found.
>>
>> Suggested-by:  Pan Xinhui<xinhui@...ux.vnet.ibm.com>
>> Signed-off-by: Waiman Long<Waiman.Long@....com>
>> ---
>>
>>   v2->v3:
>>    - Rephrase the changelog and some of the comments to make the
>>      intention of this patch more clear.
>>    - Make pv_lookup_hash() architecture dependent to eliminate its cost
>>      if an architecture doesn't need it. Also make the number of
>>      cachelines searched in pv_lookup_hash() to between 1-4.
>>    - [RESEND] Fix typo error.
>>
>>   v1->v2:
>>    - In pv_wait_node(), lookup the hash table for the queue head and pass
>>      the lock holder cpu number to pv_wait().
>>
> [snip]
>
>> @@ -282,7 +328,8 @@ static void pv_init_node(struct mcs_spinlock *node)
>>    * pv_kick_node() is used to set _Q_SLOW_VAL and fill in hash table on its
>>    * behalf.
>>    */
>> -static void pv_wait_node(struct mcs_spinlock *node, struct mcs_spinlock *prev)
>> +static void pv_wait_node(struct qspinlock *lock, struct mcs_spinlock *node,
>> +			 struct mcs_spinlock *prev)
>>   {
>>   	struct pv_node *pn = (struct pv_node *)node;
>>   	struct pv_node *pp = (struct pv_node *)prev;
>> @@ -290,6 +337,8 @@ static void pv_wait_node(struct mcs_spinlock *node, struct mcs_spinlock *prev)
>>   	int loop;
>>   	bool wait_early;
>>
>> +	pn->prev_cpu = pp->cpu;	/* Save vCPU # of previous queue entry */
>> +
>>   	/* waitcnt processing will be compiled out if !QUEUED_LOCK_STAT */
>>   	for (;; waitcnt++) {
>>   		for (wait_early = false, loop = SPIN_THRESHOLD; loop; loop--) {
>> @@ -314,10 +363,23 @@ static void pv_wait_node(struct mcs_spinlock *node, struct mcs_spinlock *prev)
>>   		smp_store_mb(pn->state, vcpu_halted);
>>
>>   		if (!READ_ONCE(node->locked)) {
>> +			struct pv_node *ph;
>> +
>>   			qstat_inc(qstat_pv_wait_node, true);
>>   			qstat_inc(qstat_pv_wait_again, waitcnt);
>>   			qstat_inc(qstat_pv_wait_early, wait_early);
>> -			pv_wait(&pn->state, vcpu_halted);
>> +
>> +			/*
>> +			 * If the current queue head is in the hash table,
>> +			 * the prev_cpu field of its pv_node may contain the
>> +			 * vCPU # of the lock holder. However, lock stealing
>> +			 * may make that information inaccurate. Anyway, we
>> +			 * look up the hash table to try to get the lock
>> +			 * holder vCPU number.
>> +			 */
>> +			ph = pv_lookup_hash(lock);
> I am thinking, could we save the hash lookup here if wait_early == true?
> Because in that case, the previous node is likely to get the lock soon,
> it makes sense we yield to that node rather than the holder, so may we
> can do something like:
>
> 			if (wait_early)
> 				pv_wait(&pn->state, vcpu_halted, pn->prev_cpu);
> 			else {
> 				ph = pv_lookup_hash(lock);
> 				pv_wait(&pn->state, vcpu_halted,
> 					ph ? ph->prev_cpu : -1);
> 			}
>
> Thoughts?
>
> Regards,
> Boqun
>

One reason why I don't want to just wake up the previous node vCPU is 
because I am concerned that a vCPU may go into a suspend-wakeup loop. So 
I will wait until we get experimental evidence that this really help 
performance before we commit to this kind of change.

Cheers,
Longman