lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sun, 5 Jun 2011 20:59:57 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	Arne Jansen <lists@...-jansens.de>
Cc:	Peter Zijlstra <peterz@...radead.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	mingo@...hat.com, hpa@...or.com, linux-kernel@...r.kernel.org,
	efault@....de, npiggin@...nel.dk, akpm@...ux-foundation.org,
	frank.rowand@...sony.com, tglx@...utronix.de,
	linux-tip-commits@...r.kernel.org
Subject: Re: [debug patch] printk: Add a printk killswitch to robustify NMI
 watchdog messages


* Arne Jansen <lists@...-jansens.de> wrote:

> > hm, it's hard to interpret that without the spin_lock()/unlock() 
> > logic keeping the dumps apart.
> 
> The locking was in place from the beginning. [...]

Ok, i was surprised it looked relatively ordered :-)

> [...] As the output is still scrambled, there are other sources for 
> BUG/WARN outside the watchdog that trigger in parallel. Maybe we 
> should protect the whole BUG/WARN mechanism with a lock and send it 
> to early_printk from the beginning, so we don't have to wait for 
> the watchdog to kill printk off and the first BUG can come through. 
> Or just let WARN/BUG kill off printk instead of the watchdog 
> (though I have to get rid of that syslog-WARN on startup).

I had yet another look at your lockup.txt and i think the main cause 
is the WARN_ON() caused by the not-held pi_lock. The lockup there 
causes other CPUs to wedge in printk, which triggers spinlock-lockup 
messages there.

So i think the primary trigger is the pi_lock WARN_ON() (as your 
bisection has confirmed that too), everything else comes from this.

Unfortunately i don't think we can really 'fix' the problem by 
removing the assert. By all means the assert is correct: pi_lock 
should be held there. If we are not holding it then we likely won't 
crash in an easily visible way - it's a lot easier to trigger asserts 
than to trigger obscure side-effects of locking bugs.

It is also a mystery why only printk() triggers this bug. The wakeup 
done there is not particularly special, so by all means we should 
have seen similar lockups elsewhere as well - not just with 
printk()s. Yet we are not seeing them.

So some essential piece of the puzzle is still missing.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ