lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <1241649364.8424.66.camel@morte.jpl.nasa.gov>
Date:	Wed, 06 May 2009 15:36:04 -0700
From:	Al Niessner <Al.Niessner@....nasa.gov>
To:	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: what are some more advanced error collection methods?


Using a volt meter, I verified that the 5V and 12V are good and the
computer is running under a normal load. So, I am going to go with the
power supply being alright for now.

I changed the CPU temperature by a couple of degrees with no failure.
While I cannot rule this out, I am willing to lean toward a software
problem; meaning, the kernel is hard locking.

Now I just need some way to get some helpful information out if it so
that I can move toward a solution.

On Wed, 2009-05-06 at 14:17 -0700, Al Niessner wrote:
> I am running 2.6.27 on an AMD 64 x2 dual core 6000+. I have the OS
> installed its own disk (SATA) and have an mdraid (SATA) with 3 disks
> being mirrored for my critical data. I also have an mdraid with 2 disks
> being mirrored (USB but I wanted firewire) for very low rate data. Both
> mdraids are nfs mounted and use automount on top of that -- nothing
> peculiar about nfs and automount except that nfs is over two networks
> each with their own NIC. My problem is that every 36 hours the machine
> simply locks up. Here is what I find:
> 
> 1) num lock light is on but was off prior to lock up
> 2) no response to beating the num and caps lock keys
> 3) no response to beating the sysreq key plus any sequences
> 4) nothing is recorded in kern.log, syslog, or any other log file
> in /var/log
> 5) cannot get to console because keyboard is dead
> 6) have to hold power switch for 10 seconds to get computer to turn off
> so the computer is not suspended (power management is not installed
> anyway)
> 7) when computer is rebooted, the mdraids are usually clean (no resync)
> 8) did a memtest and it passes
> 
> Since nothing showed up in the logs and I could not read the console, I
> found an old computer and connected the one I care about to it via
> ttyS0. Now I have the console even though the keyboard is dead. However,
> when the lock up occurs, there is absolutely no output to my RS232
> console. I put a pulse onto the console via /dev/console and get stuff
> right up until the change of state, but no panic shows up. On reboot, I
> start getting characters from the kernel immediately. Hence, I have to
> conclude that the serial connection is viable, but there is simply no
> output from the kernel.
> 
> So, I have tried all of the simple stuff that I know about or found via
> google. Now I would like some more advanced ways of trying to pry
> helpful information from a dying kernel. Are there more advanced tools,
> tricks, or secrets for collecting fault information?
> 
> Any and all help is appreciated in advance.
> 
> One last item, I am still working on determining if this is a hardware
> or software problem. The voltages look resonable and the room is
> thermally stable to +/- 1C. So, I am having a hard time blaming
> hardware.
> 
-- 
Al Niessner
818.354.0859

--------
|  dS  | >= 0
--------

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ