lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20070926234814.GA27743@jim.sh>
Date:	Wed, 26 Sep 2007 19:48:14 -0400
From:	Jim Paris <jim@...n.com>
To:	AndrewL733 <AndrewL733@....com>,
	Randy Dunlap <rdunlap@...otime.net>,
	Alan Cox <alan@...rguk.ukuu.org.uk>,
	samson yeung <fragmede@...patchdown.net>
Cc:	linux-kernel <linux-kernel@...r.kernel.org>, bbermack@...m.mit.edu,
	Justin Mazzola Paluska <jmp@....edu>
Subject: Re: NMI error and Intel S5000PSL Motherboards

Hello,

> We have about 100 servers based on Intel S5000PSL-SATA motherboards. 
> They have been running for anywhere between 1 and 10 months. For the 
> past few months, after updating them all to the 2.6.20.15 kernel 
> (because of a bug in the 2.6.18 kernel), we are seeing some strange NMI 
> errors. For example:
> 
> Aug 29 09:02:10 master kernel: Uhhuh. NMI received for unknown reason 30.
> Aug 29 09:02:10 master kernel: Do you have a strange power saving mode enabled?
> Aug 29 09:02:10 master kernel: Dazed and confused, but trying to continue

I'm also working with Andrew and Samson.  It seems that the cause of
the problem is CONFIG_PCIEAER, which was introduced after 2.6.18 and
defaults to y.

With CONFIG_PCIEAER=n, scanpci works fine with no errors.  This is the
workaround that they'll likely use for now.

With CONFIG_PCIEAER=y, scanpci always triggers the NMI error.  The
option aerdriver.forceload=1 has no effect.

The related dmesg output at boot is:

  Evaluate _OSC Set fails. Status = 0x0005
  Evaluate _OSC Set fails. Status = 0x0005
  aer_init: AER service init fails - Run ACPI _OSC fails
  aer: probe of 0000:00:02.0:pcie01 failed with error 2
  aer_init: AER service init fails - No ACPI _OSC support
  aer: probe of 0000:00:03.0:pcie01 failed with error 1
  Evaluate _OSC Set fails. Status = 0x0005
  Evaluate _OSC Set fails. Status = 0x0005
  aer_init: AER service init fails - Run ACPI _OSC fails
  aer: probe of 0000:00:04.0:pcie01 failed with error 2
  Evaluate _OSC Set fails. Status = 0x0005
  Evaluate _OSC Set fails. Status = 0x0005
  aer_init: AER service init fails - Run ACPI _OSC fails
  aer: probe of 0000:00:05.0:pcie01 failed with error 2
  Evaluate _OSC Set fails. Status = 0x0005
  Evaluate _OSC Set fails. Status = 0x0005
  aer_init: AER service init fails - Run ACPI _OSC fails
  aer: probe of 0000:00:06.0:pcie01 failed with error 2
  aer_init: AER service init fails - No ACPI _OSC support
  aer: probe of 0000:00:07.0:pcie01 failed with error 1

Full dmesg, lspci, and ACPI DSDT are available here:
  http://jim.sh/~jim/tmp/nmi/

-jim
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ