linux-kernel - Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <56A230C3.4070801@hpe.com>
Date:	Fri, 22 Jan 2016 08:38:11 -0500
From:	Waiman Long <waiman.long@....com>
To:	Davidlohr Bueso <dave@...olabs.net>
CC:	Ding Tianhong <dingtianhong@...wei.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...hat.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	"Paul E. McKenney" <paulmck@...ibm.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Will Deacon <Will.Deacon@....com>,
	Jason Low <jason.low2@...com>,
	Tim Chen <tim.c.chen@...ux.intel.com>,
	Waiman Long <Waiman.Long@...com>
Subject: Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list
 is not NULL.

On 01/22/2016 01:09 AM, Davidlohr Bueso wrote:
> On Thu, 21 Jan 2016, Waiman Long wrote:
>
>> On 01/21/2016 04:29 AM, Ding Tianhong wrote:
>
>>> I got the vmcore and found that the ifconfig is already in the 
>>> wait_list of the
>>> rtnl_lock for 120 second, but my process could get and release the 
>>> rtnl_lock
>>> normally several times in one second, so it means that my process 
>>> jump the
>>> queue and the ifconfig couldn't get the rtnl all the time, I check 
>>> the mutex lock
>>> slow path and found that the mutex may spin on owner ignore whether 
>>> the  wait list
>>> is empty, it will cause the task in the wait list always be cut in 
>>> line, so add
>>> test for wait list in the mutex_can_spin_on_owner and avoid this 
>>> problem.
>
> So this has been somewhat always known, at least in theory, until now. 
> It's the cost
> of spinning without going through the wait-queue, unlike other locks.
>
>>> [...]
>
>> From: Waiman Long <Waiman.Long@....com>
>> Date: Thu, 21 Jan 2016 17:53:14 -0500
>> Subject: [PATCH] locking/mutex: Enable optimistic spinning of woken 
>> task in wait list
>>
>> Ding Tianhong reported a live-lock situation where a constant stream
>> of incoming optimistic spinners blocked a task in the wait list from
>> getting the mutex.
>>
>> This patch attempts to fix this live-lock condition by enabling the
>> a woken task in the wait list to enter optimistic spinning loop itself
>> with precedence over the ones in the OSQ. This should prevent the
>> live-lock
>> condition from happening.
>
> And one of the reasons why we never bothered 'fixing' things was the 
> additional
> branching out in the slowpath (and lack of real issue, although this 
> one being so
> damn pathological). I fear that your approach is one of those 
> scenarios where the
> code ends up being bloated, albeit most of it is actually duplicated 
> and can be
> refactored *sigh*. So now we'd spin, then sleep, then try spinning 
> then sleep again...
> phew. Not to mention the performance implications, ie loosing the 
> benefits of osq
> over waiter spinning in scenarios that would otherwise have more osq 
> spinners as
> opposed to waiter spinners, or in setups where it is actually best to 
> block instead
> of spinning.

The patch that I sent out is just a proof of concept to make sure that 
it can fix that particular case. I do plan to refactor it if I decide to 
go ahead with an official one. Unlike the OSQ, there can be no more than 
one waiter spinner as the wakeup function is directed to only the first 
task in the wait list and the spinning won't happen until the task is 
first woken up. In the worst case scenario, there are only 2 spinners 
spinning on the lock and the owner field, one from OSQ and one from the 
wait list. That shouldn't put too much cacheline contention traffic to 
the system.

Cheers,
Longman