lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 19 Dec 2014 09:30:37 -0500
From:	Chris Mason <clm@...com>
To:	Dave Jones <davej@...hat.com>
CC:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Mike Galbraith <umgwanakikbuti@...il.com>,
	Ingo Molnar <mingo@...nel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Dâniel Fraga <fragabr@...il.com>,
	Sasha Levin <sasha.levin@...cle.com>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Suresh Siddha <sbsiddha@...il.com>,
	Oleg Nesterov <oleg@...hat.com>,
	Peter Anvin <hpa@...ux.intel.com>
Subject: Re: frequent lockups in 3.18rc4



On Thu, Dec 18, 2014 at 10:58 PM, Dave Jones <davej@...hat.com> wrote:
> On Thu, Dec 18, 2014 at 07:49:41PM -0800, Linus Torvalds wrote:
> 
>  > And when spinlocks start getting  contention, *nested* spinlocks
>  > really really hurt. And you've got all the spinlock debugging on 
> etc,
>  > don't you?
> 
> Yeah, though remember this seems to have for some reason gotten worse
> in more recent builds. I've been running kitchen-sink debug kernels
> for my trinity runs for the last three years, and it's only this
> last few months that this has got to be enough of a problem that I'm
> not seeing the more interesting bugs. (Or perhaps we're just getting
> better at fixing them in -next now, so my runs are lasting longer..)

I think we're also adding more and more debugging.  It's definitely a 
good thing, but I think a lot of them are expected to stay off until 
you're trying to track down a specific problem.  I do always run with 
CONFIG_DEBUG_PAGEALLOC here and lock debugging/lockdep, and aside from 
being slow haven't hit trouble.

I know it's 3.16 instead of 3.17, but 16K stacks are probably 
increasing the pressure on everything in these runs.  It's my favorite 
kernel feature this year, but it's likely to make trinity hurt more on 
memory constrained boxes.

Your trace with hrtimer debugging yesterday made some sense, but it 
still should have been survivable.  I mean you should have kept seeing 
lockups from that one poor task being starved out of filling up his 
pool.  I know you have traces with a ton more output, but I'm still 
wondering if usb-serial and printk from NMI really get along well.  I'd 
try with debugging back on and serial consoles off.  We carry patches 
to make oom print less, just because the time spent on our slow 
emulated serial console is enough to back the box up into a death 
spiral.

The fairness of spinlock debugging is a really great point too, 
definitely worth trying with that off (and fixing, I love spinlock 
debugging).

-chris



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ