lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Date:	Thu, 30 Apr 2009 07:39:33 -0700 (PDT)
From:	Doug Thompson <norsk5@...oo.com>
To:	Andi Kleen <andi@...stfloor.org>
Cc:	akpm@...ux-foundation.org, greg@...ah.com, mingo@...e.hu,
	tglx@...utronix.de, hpa@...or.com, dougthompson@...ssion.com,
	linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 00/21 v2] amd64_edac: EDAC module for AMD64


--- On Thu, 4/30/09, Andi Kleen <andi@...stfloor.org> wrote:

> From: Andi Kleen <andi@...stfloor.org>
> Subject: Re: [RFC PATCH 00/21 v2] amd64_edac: EDAC module for AMD64
> To: "Doug Thompson" <norsk5@...oo.com>
> Cc: "Andi Kleen" <andi@...stfloor.org>
> Date: Thursday, April 30, 2009, 1:05 AM
> > The problem we have had is once
> > an Uncorrected Error fires and dumps the address, mapping it
> > to the DIMM silk screen label is difficult, especially in
> > user space, in gaining access to the registers of the
> > controller.  
> 
> You can just do it either after reboot or in the crash
> kernel. I don't
> think it's required to put it all in kernel. Also you don't
> really
> need access to the registers; 

Actually, according to AMD, their reference code for mapping from an error address to a memory slot does require access to the controller's registers. On page 67 of the BKDG for family F10 from their website is 2 and 1/2 pages of the code to perform that mapping. It takes into consideration interleaving of all kinds, etc. It is narly to say the least.

> SMBIOS provides this
> information and
> mcelog knows how to convert it. 

As I undestand SMBIOS it provides a linear assignment of basic memory starts and lengths but does not provide the memory controller context as AMD's reference code takes into consideration

> 
> Trying to add other consumers to mce.c will be likely very
> messy;
> there's really no generic way to do it. I hope you're not
> planning
> turning the nicely CPU independent code in mce.c into a
> mess
> of twisty CPU specific passages like the old 32bit code
> was.
> 
> -Andi

No, not at all. Keeping the "clean" code is paramount, but we are seeking for an interface to accept the MCE error register structure and  map that information to at least a DIMM label field, if not more.

The EDAC module would register for that interface upon loading and unregister upon module unload.

The MCE code would call a stub routine that either returns no mapping occurred OR call the EDAC mapper. MCE could then determine from that return code if a mapping occurred or not. If it did, then display the desired information, otherwise proceed as normal.

doug t

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ