lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0F10A59FDFFDFD4E9BEBD7365DE672550214F0B1@uk-email.terastack.bluearc.com>
Date:	Tue, 19 Aug 2008 09:17:41 +0100
From:	"Andy Chittenden" <andyc@...earc.com>
To:	"Doug Thompson" <norsk5@...oo.com>, <linux-kernel@...r.kernel.org>
Subject: RE: Linux 2.6.26 edac errors and ASUS P5W DH Deluxe motherboard

Hi Doug

> I don't know which version of the  source code was used in the 25 or 
> the 26 versions of the debian package, but it might be that the later 
> one is really finding errors as I remember there was some patches 
> against the i82975x module.

I've done a diff between 2.6.25 and 2.6.26 source code of the
i82975x_edac module. As you can see, there's not much difference:

# diff -u linux-2.6.2[56]/drivers/edac/i82975x_edac.c 
--- linux-2.6.25/drivers/edac/i82975x_edac.c    2008-04-17
03:49:44.000000000 +0100
+++ linux-2.6.26/drivers/edac/i82975x_edac.c    2008-07-13
22:51:29.000000000 +0100
@@ -14,7 +14,7 @@
 #include <linux/pci.h>
 #include <linux/pci_ids.h>
 #include <linux/slab.h>
-
+#include <linux/edac.h>
 #include "edac_core.h"
 
 #define I82975X_REVISION       " Ver: 1.0.0 " __DATE__
@@ -611,6 +611,9 @@
 
        debugf3("%s()\n", __func__);
 
+       /* Ensure that the OPSTATE is set correctly for POLL or NMI */
+       opstate_init();
+
        pci_rc = pci_register_driver(&i82975x_driver);
        if (pci_rc < 0)
                goto fail0;
@@ -664,3 +667,6 @@
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Arvind R. <arvind@...rlab.com>");
 MODULE_DESCRIPTION("MC support for Intel 82975 memory hub
controllers");
+
+module_param(edac_op_state, int, 0444);
+MODULE_PARM_DESC(edac_op_state, "EDAC Error Reporting state:
0=Poll,1=NMI");


> Are they ALL the same row, or are they different rows? If different, 
> they could be legit. The same row there might be an issue.

Hmm, they're different. On another m/c, I've managed to find the logged
info when it booted up 2.6.26:

/var/log/kern.log.1.gz:Aug  4 11:38:15 diesel kernel: [    9.079151]
EDAC MC0: UE page 0x7fe0b, offset 0x0, grain 128, row 1, labels ":":
i82975x UE
/var/log/kern.log.1.gz:Aug  4 11:38:15 diesel kernel: [   10.104762]
EDAC MC0: UE page 0x7e451, offset 0x0, grain 128, row 1, labels ":":
i82975x UE
/var/log/kern.log.1.gz:Aug  4 11:38:15 diesel kernel: [   11.110256]
EDAC MC0: UE page 0x7e7ae, offset 0x0, grain 128, row 1, labels ":":
i82975x UE
...
/var/log/kern.log.1.gz:Aug  4 11:52:05 diesel kernel: [   11.636753]
EDAC MC0: UE page 0x60000, offset 0x0, grain 128, row 1, labels ":":
i82975x UE
/var/log/kern.log.1.gz:Aug  4 11:52:05 diesel kernel: [   12.641616]
EDAC MC0: UE page 0xde771, offset 0x0, grain 128, row 3, labels ":":
i82975x UE
/var/log/kern.log.1.gz:Aug  4 11:52:05 diesel kernel: [   13.734052]
EDAC MC0: UE page 0xde771, offset 0x0, grain 128, row 3, labels ":":
i82975x UE
/var/log/kern.log.1.gz:Aug  4 11:52:05 diesel kernel: [   14.743449]
EDAC MC0: UE page 0xde771, offset 0x0, grain 128, row 3, labels ":":
i82975x UE

> When properly set by edac-utils
(http://sourceforge.net/projects/edac-utils/) ...

Thanks for the pointer. I've now installed edac-utils on the offending
motherboards. It seems that the motherboard is half known about:

# edac-ctl --mainboard
edac-ctl: mainboard: ASUSTEK COMPUTER INC P5W DH Deluxe
# edac-ctl --print-labels
No dimm labels for ASUSTEK COMPUTER INC P5W DH Deluxe

dmidecode gives some memory module info:

Handle 0x0009, DMI type 6, 12 bytes
Memory Module Information
        Socket Designation: DIMM0
        Bank Connections: 9 11
        Current Speed: 30 ns
        Type: Unknown FPM Parity ECC SDRAM
        Installed Size: 2048 MB (Double-bank Connection)
        Enabled Size: 2048 MB (Double-bank Connection)
        Error Status: OK

Handle 0x000A, DMI type 6, 12 bytes
Memory Module Information
        Socket Designation: DIMM1
        Bank Connections: 9 11
        Current Speed: 30 ns
        Type: Unknown FPM Parity ECC SDRAM
        Installed Size: 2048 MB (Double-bank Connection)
        Enabled Size: 2048 MB (Double-bank Connection)
        Error Status: OK

Handle 0x000B, DMI type 6, 12 bytes
Memory Module Information
        Socket Designation: DIMM2
        Bank Connections: 9 11
        Current Speed: 30 ns
        Type: Unknown FPM Parity ECC SDRAM
        Installed Size: 2048 MB (Double-bank Connection)
        Enabled Size: 2048 MB (Double-bank Connection)
        Error Status: OK

Handle 0x000C, DMI type 6, 12 bytes
Memory Module Information
        Socket Designation: DIMM3
        Bank Connections: 9 11
        Current Speed: 30 ns
        Type: Unknown FPM Parity ECC SDRAM
        Installed Size: 2048 MB (Double-bank Connection)
        Enabled Size: 2048 MB (Double-bank Connection)
        Error Status: OK

> Since I don't have one of these chipsets, is it possible I could
access to one or more of these
machines to take a look around?

Unfortunately not. If there's any commands you'd like me to run, then
please let me know.

If you could let me know what I need to put in /etc/edac/labels.db, that
would be appreciated too.

-- 
Andy, BlueArc Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ