lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20061116153444.GC8238@csclub.uwaterloo.ca>
Date:	Thu, 16 Nov 2006 10:34:44 -0500
From:	Lennart Sorensen <lsorense@...lub.uwaterloo.ca>
To:	linux-kernel@...r.kernel.org
Subject: How to go about debuging a system lockup?

We have a router with a Geode SC1200 cpu, with 4 AMD 972 ethernet ports
(pcnet32) behind a PLX 6152 PCI-PCI bridge, which quite regularly locks
up completely if we try to do simultanius traffic on all 4 ports (our
test case sends data from port 1 to port 2, and back and from port 3 to
port 4 and back at a rate of 8000 packets per second using 1500byte
packets).  We usually manage to run the test for about 1 minute before
the system hangs.  This happens on every one of the systems we have
tried so far.  If we only run 2 ports, it seems to never die, and with 3
ports we haven't seen any failures yet, although maybe we just haven't
tested long enough.  If we just receive the packets but don't forward
them out again, then we never crash, so it seems to be related to
simultanious transmit on the pcnet32s.

So far I have tried printing a message everytime the pcnet32 driver
enables and disables interrupts to find out if it hangs somewhere with
interrupts disabled, but that didn't seem to indicate anything
meaningful.

So far I have tried this with 2.6.8, 2.6.16.22, and 2.6.18.2 and no
difference so far.  I can't think of what kind of even could cause the
system to just hang with no further console output or a kernel panic or
oops or anything.  Usually most errors produce some kind of message.

Does anyone have any suggestions for where I go from here to find out
what is happening and where to look?  I don't even know if I should
suspect the hardware or the software at this point.  I want to know if
the program counter is still changing, or if the cpu is simply hung or
something, but I have no idea how to get at that.

--
Len Sorensen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ