lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <1241644672.8424.62.camel@morte.jpl.nasa.gov>
Date:	Wed, 06 May 2009 14:17:52 -0700
From:	Al Niessner <Al.Niessner@....nasa.gov>
To:	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: what are some more advanced error collection methods?


I am running 2.6.27 on an AMD 64 x2 dual core 6000+. I have the OS
installed its own disk (SATA) and have an mdraid (SATA) with 3 disks
being mirrored for my critical data. I also have an mdraid with 2 disks
being mirrored (USB but I wanted firewire) for very low rate data. Both
mdraids are nfs mounted and use automount on top of that -- nothing
peculiar about nfs and automount except that nfs is over two networks
each with their own NIC. My problem is that every 36 hours the machine
simply locks up. Here is what I find:

1) num lock light is on but was off prior to lock up
2) no response to beating the num and caps lock keys
3) no response to beating the sysreq key plus any sequences
4) nothing is recorded in kern.log, syslog, or any other log file
in /var/log
5) cannot get to console because keyboard is dead
6) have to hold power switch for 10 seconds to get computer to turn off
so the computer is not suspended (power management is not installed
anyway)
7) when computer is rebooted, the mdraids are usually clean (no resync)
8) did a memtest and it passes

Since nothing showed up in the logs and I could not read the console, I
found an old computer and connected the one I care about to it via
ttyS0. Now I have the console even though the keyboard is dead. However,
when the lock up occurs, there is absolutely no output to my RS232
console. I put a pulse onto the console via /dev/console and get stuff
right up until the change of state, but no panic shows up. On reboot, I
start getting characters from the kernel immediately. Hence, I have to
conclude that the serial connection is viable, but there is simply no
output from the kernel.

So, I have tried all of the simple stuff that I know about or found via
google. Now I would like some more advanced ways of trying to pry
helpful information from a dying kernel. Are there more advanced tools,
tricks, or secrets for collecting fault information?

Any and all help is appreciated in advance.

One last item, I am still working on determining if this is a hardware
or software problem. The voltages look resonable and the room is
thermally stable to +/- 1C. So, I am having a hard time blaming
hardware.

-- 
Al Niessner
818.354.0859

All opinions stated above are mine and do not necessarily reflect those
of JPL or NASA.

--------
|  dS  | >= 0
--------


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ