linux-kernel - Re: [PATCH v2 2/5] locking/pvqspinlock: Fix missed PV wakeup problem

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5789424D.7020908@hpe.com>
Date:	Fri, 15 Jul 2016 16:06:37 -0400
From:	Waiman Long <waiman.long@....com>
To:	Peter Zijlstra <peterz@...radead.org>
CC:	Pan Xinhui <xinhui@...ux.vnet.ibm.com>,
	Ingo Molnar <mingo@...hat.com>, <linux-kernel@...r.kernel.org>,
	Boqun Feng <boqun.feng@...il.com>,
	Scott J Norton <scott.norton@....com>,
	Douglas Hatch <doug.hatch@....com>
Subject: Re: [PATCH v2 2/5] locking/pvqspinlock: Fix missed PV wakeup problem

On 07/15/2016 06:07 AM, Peter Zijlstra wrote:
> On Fri, Jul 15, 2016 at 05:39:46PM +0800, Pan Xinhui wrote:
>>> I'm thinking you're trying to say this:
>>>
>>>
>>> CPU0			CPU1			CPU2
>>>
>>> __pv_queued_spin_unlock_slowpath()
>>>   ...
>>>   smp_store_release(&l->locked, 0);
>>> 			__pv_queued_spin_lock_slowpath()
>>> 			  ...
>>> 			  pv_queued_spin_steal_lock()
>>> 			    cmpxchg(&l->locked, 0, _Q_LOCKED_VAL) == 0
>>>
>>>
>>> 						pv_wait_head_or_lock()
>>>
>>>   pv_kick(node->cpu);  ---------------------->	pv_wait(&l->locked, _Q_SLOW_VAL);
>>>
>>> 			__pv_queued_spin_unlock()
>>> 			  cmpxchg(&l->locked, _Q_LOCKED_VAL, 0) == _Q_LOCKED_VAL
>>>
>>> 						  for () {
>>> 						    trylock_clear_pending();
>>> 						    cpu_relax();
>>> 						  }
>>>
>>> 						  pv_wait(&l->locked, _Q_SLOW_VAL);
>>>
>>>
>>> Which is indeed 'bad', but not fatal, note that the later pv_wait() will
>>> not in fact go wait, since l->locked will _not_ be _Q_SLOW_VAL.
>> the problem is that "this later pv_wait will do nothing as l->locked
>> is not _Q_SLOW_VAL", So it is not paravirt friendly then. we will go
>> into the trylock loop again and again until the lock is unlocked.
> Agreed, which is 'bad'. But the patch spoke about a missing wakeup,
> which is worse, as that would completely inhibit progress.

Sorry, it is my mistake. There is no missing pv_wait().

>> So if we are kicked by the unlock_slowpath, and the lock is stealed by
>> someone else,  we need hash its node again and set l->locked to
>> _Q_SLOW_VAL, then enter pv_wait.
> Right, let me go think about this a bit.

Yes, the purpose of this patch is to do exactly that. Let's the queue 
head vCPU sleeps until the lock holder release the lock and wake the 
queue head vCPU up.

>
>> but I am worried about lock stealing. could the node in the queue
>> starve for a long time? I notice the latency of pv_wait on an
>> over-commited guest can be bigger than 300us. I have not seen such
>> starving case, but I think it is possible to happen.
> I share that worry, which is why we limit the steal attempt to one.
> But yes, theoretically its possible to starve things AFAICT.
>
> We've not come up with sensible way to completely avoid starvation.

If you guys are worrying about lock constantly getting stolen between 
pv_kick() of queue head vCPU and it is ready to take the lock, we can 
keep the pending bit set across pv_wait() if it is the 2nd or later time 
that pv_wait() is called. That will ensure that no lock stealing can 
happen and cap the maximum wait time to about 2x (spin + pv_wait). I 
will add that patch to my patch series.

Cheers,
Longman