linux-kernel - Re: [RFC][PATCH] Fix a race between rwsem and the scheduler

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <eb570bdc-5938-1c7c-4c2e-ac9b50ebbe2d@gmail.com>
Date:   Wed, 31 Aug 2016 20:17:50 +1000
From:   Balbir Singh <bsingharora@...il.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Oleg Nesterov <oleg@...hat.com>,
        Benjamin Herrenschmidt <benh@...nel.crashing.org>,
        Nicholas Piggin <nicholas.piggin@...il.com>,
        Alexey Kardashevskiy <aik@...abs.ru>
Subject: Re: [RFC][PATCH] Fix a race between rwsem and the scheduler



On 31/08/16 17:28, Peter Zijlstra wrote:
> On Wed, Aug 31, 2016 at 01:41:33PM +1000, Balbir Singh wrote:
>> On 30/08/16 22:19, Peter Zijlstra wrote:
>>> On Tue, Aug 30, 2016 at 06:49:37PM +1000, Balbir Singh wrote:
>>>>
>>>>
>>>> The origin of the issue I've seen seems to be related to
>>>> rwsem spin lock stealing. Basically I see the system deadlock'd in the
>>>> following state
>>>
>>> As Nick says (good to see you're back Nick!), this is unrelated to
>>> rwsems.
>>>
>>> This is true for pretty much every blocking wait loop out there, they
>>> all do:
>>>
>>> 	for (;;) {
>>> 		current->state = UNINTERRUPTIBLE;
>>> 		smp_mb();
>>> 		if (cond)
>>> 			break;
>>> 		schedule();
>>> 	}
>>> 	current->state = RUNNING;
>>>
>>> Which, if the wakeup is spurious, is just the pattern you need.
>>
>> Yes True! My bad Alexey had seen the same basic pattern, I should have been clearer
>> in my commit log. Should I resend the patch?
> 
> Yes please.
> 

Done, just now. I've tried to generalize the issue, but I've kept the
example

>>> There isn't an MB there. The best I can do is UNLOCK+LOCK, which, thanks
>>> to PPC, is _not_ MB. It is however sufficient for this case.
>>>
>>
>> The MB comes from the __switch_to() in schedule(). Ben mentioned it in a 
>> different thread.
> 
> Right, although even without that, there is sufficient ordering, as the
> rq unlock from the wakeup, coupled with the rq lock from the schedule
> already form a load-store barrier.
> 
>>> Now, this has been present for a fair while, I suspect ever since we
>>> reworked the wakeup path to not use rq->lock twice. Curious you only now
>>> hit it.
>>>
>>
>> Yes, I just hit it a a week or two back and I needed to collect data to
>> explain why p->on_rq got to 0. Hitting it requires extreme stress -- for me
>> I needed a system with large threads and less memory running stress-ng.
>> Reproducing the problem takes an unpredictable amount of time.
> 
> What hardware do you see this on, is it shiny new Power8 chips which
> have never before seen deep queues or something. Or is it 'regular' old
> Power7 like stuff?
> 

I don't think the issue is processor specific, it is probabilistic, but
I've not tested on Power7. I am seeing it on a Power8 system


Balbir Singh