lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160831072835.GB10138@twins.programming.kicks-ass.net>
Date:   Wed, 31 Aug 2016 09:28:35 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Balbir Singh <bsingharora@...il.com>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Oleg Nesterov <oleg@...hat.com>,
        Benjamin Herrenschmidt <benh@...nel.crashing.org>,
        Nicholas Piggin <nicholas.piggin@...il.com>,
        Alexey Kardashevskiy <aik@...abs.ru>
Subject: Re: [RFC][PATCH] Fix a race between rwsem and the scheduler

On Wed, Aug 31, 2016 at 01:41:33PM +1000, Balbir Singh wrote:
> On 30/08/16 22:19, Peter Zijlstra wrote:
> > On Tue, Aug 30, 2016 at 06:49:37PM +1000, Balbir Singh wrote:
> >>
> >>
> >> The origin of the issue I've seen seems to be related to
> >> rwsem spin lock stealing. Basically I see the system deadlock'd in the
> >> following state
> > 
> > As Nick says (good to see you're back Nick!), this is unrelated to
> > rwsems.
> > 
> > This is true for pretty much every blocking wait loop out there, they
> > all do:
> > 
> > 	for (;;) {
> > 		current->state = UNINTERRUPTIBLE;
> > 		smp_mb();
> > 		if (cond)
> > 			break;
> > 		schedule();
> > 	}
> > 	current->state = RUNNING;
> > 
> > Which, if the wakeup is spurious, is just the pattern you need.
> 
> Yes True! My bad Alexey had seen the same basic pattern, I should have been clearer
> in my commit log. Should I resend the patch?

Yes please.

> > There isn't an MB there. The best I can do is UNLOCK+LOCK, which, thanks
> > to PPC, is _not_ MB. It is however sufficient for this case.
> > 
> 
> The MB comes from the __switch_to() in schedule(). Ben mentioned it in a 
> different thread.

Right, although even without that, there is sufficient ordering, as the
rq unlock from the wakeup, coupled with the rq lock from the schedule
already form a load-store barrier.

> > Now, this has been present for a fair while, I suspect ever since we
> > reworked the wakeup path to not use rq->lock twice. Curious you only now
> > hit it.
> > 
> 
> Yes, I just hit it a a week or two back and I needed to collect data to
> explain why p->on_rq got to 0. Hitting it requires extreme stress -- for me
> I needed a system with large threads and less memory running stress-ng.
> Reproducing the problem takes an unpredictable amount of time.

What hardware do you see this on, is it shiny new Power8 chips which
have never before seen deep queues or something. Or is it 'regular' old
Power7 like stuff?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ