lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <loom.20110217T003826-785@post.gmane.org>
Date:	Thu, 17 Feb 2011 00:17:27 +0000 (UTC)
From:	Ryan Underwood <ryan.underwood@...ghtsafety.com>
To:	linux-kernel@...r.kernel.org
Subject: Re: 2.6.38-rc2: Uhhuh. NMI received for unknown reason 2d on CPU 0.

Preeti Khurana <Preeti.Khurana <at> guavus.com> writes:

> 
> I am getting the similar issue as reported
> in https://lkml.org/lkml/2011/2/10/187
> 
> Can someone tell me if the same issue  because I am getting the
> problem on Intel Xeon..
> 

I am seeing exactly the same problem (on 2.6.35 as Preeti reported originally)
on some Xeon servers but only with recently shipped BIOS revisions. The OS is
CentOS 5.5.

In my cases, the system sometimes hangs with no comment, sometimes with a NMI
message immediately before hanging and sometimes with a long trail of
backtrace originating at cpu_idle().  The NMI reason code is different but
in my observation it is usually 21 or 31.

The problem seems to be triggered by accessing a PCI card (via MMIO) because 
until accessing the PCI card, the system will run forever with no problems.

Other servers of exactly the same model (Intel SR2500) but older BIOS revision
are working (working is 3/14/2008, non working is 3/9/2010).  All software is
identical in these cases.

Also, in one instance, kernel v2.6.18 is used on these servers with the
3/14/2008 BIOS revision without a problem.  The rest of the software is again
the same (except for kernel and drivers).

It seems to be a problem with newer kernels combined with the newer Intel BIOS.
I have not tried an older kernel on the newer BIOS yet.

I have not tried the following patches yet which seem to both be for spurious
NMI messages, not accompanied by system lockups:

https://lkml.org/lkml/2011/2/16/106
https://lkml.org/lkml/2011/2/1/286

Both nmi_watchdog=0 and pcie_aspm=off options do not solve the problem.

I am not subscribed so please Cc me.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ