lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100615100117.GA17953@aftab>
Date:	Tue, 15 Jun 2010 12:01:17 +0200
From:	Borislav Petkov <bp@...64.org>
To:	Nils Carlson <nils.carlson@...d.ltu.se>
Cc:	Andi Kleen <andi@...stfloor.org>, Doug Thompson <norsk5@...oo.com>,
	Tony Luck <tony.luck@...el.com>,
	"Eric W. Biederman" <ebiederm@...ssion.com>,
	Ingo Molnar <mingo@...e.hu>,
	Hidetoshi Seto <seto.hidetoshi@...fujitsu.com>,
	Mauro Carvalho Chehab <mchehab@...hat.com>,
	BrentYoung <brent.young@...el.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	"bluesmoke-devel@...ts.sourceforge.net" 
	<bluesmoke-devel@...ts.sourceforge.net>,
	Doug Thompson <dougthompson@...ssion.com>,
	Joe Perches <joe@...ches.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Linux Edac Mailing List <linux-edac@...r.kernel.org>,
	Ingo Molnar <mingo@...hat.com>,
	Matt Domsch <Matt_Domsch@...l.com>,
	Nils Carlson <nils.carlson@...csson.com>
Subject: Re: Hardware Error Kernel Mini-Summit

From: Nils Carlson <nils.carlson@...d.ltu.se>
Date: Tue, Jun 15, 2010 at 04:06:33AM -0400

> On Tue, 15 Jun 2010, Andi Kleen wrote:
> 
> > On Mon, Jun 14, 2010 at 04:46:40PM -0700, Doug Thompson wrote:
> >
> > Hi Doug,
> >
> > >
> > > Maybe I didn't see it covered (or I missed it), but EDAC is used on more than just x86 based machines, though they are the majority by volume. We should have an abstraction that covers all the archs, like we do with other subsystems of Linux.
> >
> > The way I envision it to working is that a abstracted dimm interface
> > (or edac2 or whatever you want to call it) can be fed from any reasonable
> > DIMM layout driver. This could be either DMI on x86 or some other
> > driver. There would be nothing really x86 specific about that.
> 
> Could you maybe provide some references on how DIMM layout
> could be read from DMI? I can't find anything nearly this specific,
> or is it something we're expecting to happen in future BIOS's?
> 
> Also, there would probably need to be some standard describing
> different DIMM layouts in general, though maybe such a thing exists.
> 
> In other words, there would be have to be some way of ascertaining
> that the info you read from DMI is sufficient to decode MCEs so that
> a faulting DIMM can be identified. In an ideal world, this could
> be tested by some simple tool that could be run by the BIOS writers
> to test that they're providing the OS with sufficient info.

You cannot decode an ECC to a DIMM only using DMI info - at least on AMD
you cannot. The MCE contains the physical address where the ECC happened
and you need EDAC to convert this to a chip select row. Additionally,
you need the error syndrome depending on the dram controllers addressing
mode used.

Now, after you have the chip select row, you need to map this to a DIMM
rank and in order to do that, you need the DIMM info which is in the
SPD ROM (one of the data in the SPD is the DIMM rank which is needed
to unambiguously pinpoint which DIMM is generating those errors). Then
you can use the DMI info - assuming it contains the correct silk screen
labels on the motherboard - to map to a DIMM.

What currently EDAC does is decode the ECC to a chip select - what we
need is some I2C/SMBus code which can read the SPD ROM. I haven't had
the time to look into it yet, though.

-- 
Regards/Gruss,
Boris.

Operating Systems Research Center
Advanced Micro Devices, Inc.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ