lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <200901121859.20893.chris@csamuel.org>
Date:	Mon, 12 Jan 2009 18:59:16 +1100
From:	Chris Samuel <chris@...muel.org>
To:	linux-btrfs@...r.kernel.org
Cc:	David Woodhouse <dwmw2@...radead.org>,
	Andi Kleen <andi@...stfloor.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Ingo Molnar <mingo@...e.hu>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Harvey Harrison <harvey.harrison@...il.com>,
	"H. Peter Anvin" <hpa@...or.com>,
	Chris Mason <chris.mason@...cle.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	paulmck@...ux.vnet.ibm.com, Gregory Haskins <ghaskins@...ell.com>,
	Matthew Wilcox <matthew@....cx>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	"linux-fsdevel" <linux-fsdevel@...r.kernel.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Nick Piggin <npiggin@...e.de>,
	Peter Morreale <pmorreale@...ell.com>,
	Sven Dietrich <SDietrich@...ell.com>
Subject: Hard to debug kernel issues (was Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning)

On Sun, 11 Jan 2009 11:26:41 pm David Woodhouse wrote:

> Sometimes you weren't going to get a backtrace if something goes wrong
> _anyway_.

Case in point - we've been struggling with some of our SuperMicro based 
systems with AMD Barcelona B3 k10h CPUs *turning themselves off* when running 
various HPC applications.

Nothing in the kernel logs, nothing in the IPMI controller logs. It's just 
like someone has wandered in and held the power button down (and no, it's not 
that).

It's been driving us up the wall.

We'd assumed it was a hardware issue as it was happening with all sorts of 
kernels but today we tried 2.6.29-rc1 "just in case" and I've not been able to 
reproduce the crash (yet) on a node I can crash in about 30 seconds, and 
rebooting back into 2.6.28 makes it crash again.

If the test boxes are still alive tomorrow I might see if we can attempt some 
form of a reverse bisect to track down what commit fixed it (git doesn't seem 
to support that so we've going to have to invert the good/bad commands).

cheers,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC

This email may come with a PGP signature as a file. Do not panic.
For more info see: http://en.wikipedia.org/wiki/OpenPGP


Download attachment "signature.asc " of type "application/pgp-signature" (482 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ