linux-kernel - Re: [RFC][PATCH] Fix a race between rwsem and the scheduler

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160831072835.GB10138@twins.programming.kicks-ass.net>
Date:   Wed, 31 Aug 2016 09:28:35 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Balbir Singh <bsingharora@...il.com>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Oleg Nesterov <oleg@...hat.com>,
        Benjamin Herrenschmidt <benh@...nel.crashing.org>,
        Nicholas Piggin <nicholas.piggin@...il.com>,
        Alexey Kardashevskiy <aik@...abs.ru>
Subject: Re: [RFC][PATCH] Fix a race between rwsem and the scheduler

On Wed, Aug 31, 2016 at 01:41:33PM +1000, Balbir Singh wrote:
> On 30/08/16 22:19, Peter Zijlstra wrote:
> > On Tue, Aug 30, 2016 at 06:49:37PM +1000, Balbir Singh wrote:
> >>
> >>
> >> The origin of the issue I've seen seems to be related to
> >> rwsem spin lock stealing. Basically I see the system deadlock'd in the
> >> following state
> > 
> > As Nick says (good to see you're back Nick!), this is unrelated to
> > rwsems.
> > 
> > This is true for pretty much every blocking wait loop out there, they
> > all do:
> > 
> > 	for (;;) {
> > 		current->state = UNINTERRUPTIBLE;
> > 		smp_mb();
> > 		if (cond)
> > 			break;
> > 		schedule();
> > 	}
> > 	current->state = RUNNING;
> > 
> > Which, if the wakeup is spurious, is just the pattern you need.
> 
> Yes True! My bad Alexey had seen the same basic pattern, I should have been clearer
> in my commit log. Should I resend the patch?

Yes please.

> > There isn't an MB there. The best I can do is UNLOCK+LOCK, which, thanks
> > to PPC, is _not_ MB. It is however sufficient for this case.
> > 
> 
> The MB comes from the __switch_to() in schedule(). Ben mentioned it in a 
> different thread.

Right, although even without that, there is sufficient ordering, as the
rq unlock from the wakeup, coupled with the rq lock from the schedule
already form a load-store barrier.

> > Now, this has been present for a fair while, I suspect ever since we
> > reworked the wakeup path to not use rq->lock twice. Curious you only now
> > hit it.
> > 
> 
> Yes, I just hit it a a week or two back and I needed to collect data to
> explain why p->on_rq got to 0. Hitting it requires extreme stress -- for me
> I needed a system with large threads and less memory running stress-ng.
> Reproducing the problem takes an unpredictable amount of time.

What hardware do you see this on, is it shiny new Power8 chips which
have never before seen deep queues or something. Or is it 'regular' old
Power7 like stuff?