[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20141219035859.GA20022@redhat.com>
Date: Thu, 18 Dec 2014 22:58:59 -0500
From: Dave Jones <davej@...hat.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Chris Mason <clm@...com>,
Mike Galbraith <umgwanakikbuti@...il.com>,
Ingo Molnar <mingo@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
Dâniel Fraga <fragabr@...il.com>,
Sasha Levin <sasha.levin@...cle.com>,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Suresh Siddha <sbsiddha@...il.com>,
Oleg Nesterov <oleg@...hat.com>,
Peter Anvin <hpa@...ux.intel.com>
Subject: Re: frequent lockups in 3.18rc4
On Thu, Dec 18, 2014 at 07:49:41PM -0800, Linus Torvalds wrote:
> And when spinlocks start getting contention, *nested* spinlocks
> really really hurt. And you've got all the spinlock debugging on etc,
> don't you?
Yeah, though remember this seems to have for some reason gotten worse
in more recent builds. I've been running kitchen-sink debug kernels
for my trinity runs for the last three years, and it's only this
last few months that this has got to be enough of a problem that I'm
not seeing the more interesting bugs. (Or perhaps we're just getting
better at fixing them in -next now, so my runs are lasting longer..)
> Also, you do have this:
>
> sched: RT throttling activated
>
> so there's something going on with RT scheduling too.
I see that fairly often. I've never dug into exactly what causes it, but
it seems to be triggerable just by some long running CPU hogs.
> So your printouts are finally starting to make sense. But I'm also
> starting to suspect strongly that the problem is that with all your
> lock debugging and other overheads (does this still have
> DEBUG_PAGEALLOC?) you really are getting into a "real" softlockup
> because things are scaling so horribly badly.
>
> If you now disable spinlock debugging and lockdep, hopefully that page
> table lock now doesn't always get hung up on the lockdep locking, so
> it starts scaling much better, and maybe you'd not see this...
I can give it a shot. Hopefully there's some further mitigation that
could be done to allow a workload like this to survive under a debug
build though, as we've caught *so many* bugs with this stuff in the past.
Dave
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists