lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20121109110807.3e09fcfc@pyramind.ukuu.org.uk>
Date:	Fri, 9 Nov 2012 11:08:07 +0000
From:	Alan Cox <alan@...rguk.ukuu.org.uk>
To:	jongman.heo@...sung.com
Cc:	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: How to debug system freeze  (not detected by kernel debug
 options)

On Fri, 09 Nov 2012 02:19:33 +0000 (GMT)
Jongman Heo <jongman.heo@...sung.com> wrote:

> 
> Dear kernel hackers,
> 
> I have a problem in SMP environment, in x86 platform (Intel Atom based embedded system)
> 
> In UP, there is no issue, but in SMP, system freezed in tens of minutes (or shorter), if I perform IO test with flash memory and HDD simultaneously (using dd).
> 
> I enabled relevant kernel debug options, like LOCKDEP, DETECT_SOFTLOCKUP, DETECT_HUNG_TASK, along with "nmi_watchdog=1".
> (Yeah, this is somewhat old kernel, 2.6.35.14).
> 
> But no debug message is shown. (I had checked that NMI interrupt count correctly increase.).
> 
> Do you have any thoughts what can cause system freeze without being detected by LOCKDEP, watchdog, and other options.

Hardware problems, firmware bugs, PATA controller hangs, some
classes of PCI device hang, certain cases where for some reason the crash
is so bad the kernel can't get the message out even though it has detected
the failure.

A good starting point is probably "can you make two identical systems do
it". If you've got a pair of boards which fed the same software set and
have the same flash and hdd crash in the same way its unlikely to be a
faulty board.

You may find it useful to make the NMI timeout handler trigger a directly
detectable event via an I/O port if your platform has a buzzer or LED
directly I/O mapped somewhere.

Failing that the fastest approach may be to use hardware debugging aids
if you have access to them for that platform.

Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ