lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <7CA56108-1CC9-4217-80AF-C7F378D23FA9@shu.ac.uk>
Date:	Wed, 2 Apr 2008 20:15:00 +0100
From:	Tim Schmielau <T.Schmielau@....ac.uk>
To:	Laurent GUERBY <laurent@...rby.net>
Cc:	linux-kernel@...r.kernel.org
Subject: Re: BUG: soft lockup detected on Phenom with Debian 2.6.24-4

[apologies if you receive this email twice; apparently the exchange  
server
this message was sent through previously triggered one of lkml's taboo
expressions]

On Sat, 08 Mar 2008 23:00:15 +0100, Laurent GUERBY wrote:

 > I have a system with an "AMD64 Phenom 9500" quad core cpu, 4GB RAM,
 > "ASUS M3A32 MVP Deluxe wifi" motherboard with latest vendor BIOS  
(0801).
 >
 > I tried stock debian etch kernel (Debian 2.6.18.dfsg.1-18etch1),  
machine
 > froze with no message, debian etch backport kernel same, and then
 > Debian 2.6.24-4 from unstable and I got some messages: machine
 > is not frozen but some userland processes are (ps says "Dl" state
 > with child in "Zs" state) and "events/3" is taking 100% cpu
 > according to top:
 >
 >    18 root      15  -5     0    0    0 R  100  0.0  74:59.46 events/3
 >
 > Got to the same state with ubuntu hardy 2.6.24-8-server kernel. All
 > kernels are untainted, no X running anyway.
 >
 > It takes a few hours of doing some stuff, in my case bootstraping or
 > testing GCC at -j 4, and then the problem happens.
 >
 > I did 32 hours of memtest without issue on this system, temperatures
 > are very low and the case has plenty of airflow, making memory
 > issue less likely.

I have a very similar, if not the same issue:

I just bought a HP Pavillion 6332 with a Phenom 9500 quad core cpu and
3GB RAM on some ASUS-like looking mainboard with NVidia MCP61 chipset
(actually my first PC not assembled from components myself).
I installed OpenSUSE 10.3 64bit and tried various kernels (2.6.24.4,
2.6.23.17 + the erratum298-workaround from AMD, OpenSUSE's default
2.6.22.5-31 and 2.6.22.17-0.1), but the machine will hang within less
than an hour of intense OpenMP load over all four cores (using a
homemade scientific application).

The symptoms of the hang are similar to what Laurent saw: In xosview,
the load display one cpu would get stuck (not necessarily at 100%), and
usually (but not always) ps would hang in the middle of its output. I
could still login (no X running here either) and use the remaining  
cores,
but the OpenMP application would hang in an unkillable state (don't
know which as ps gets stuck). Unlike Laurent, I however don't see
anything relevant in the syslog.

Occasionally I also had wrong output from the program (i.e., running it
twice gives different results, one of them bogus), although that was on
32bit OpenSUSE 10.3, as far as I remember. I know due to the nature of
OpenMP this could as well be a bug in my program, but it has not yet
happened on any other machine.

The system is well vented and very cool, but as of yet I only had time
to run memtest for 6 hours (no errors found).

If anybody has an idea how to debug this, please ask for more
information, otherwise I'll just return this machine for refund.

Thanks,
Tim

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ