lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 29 Mar 2010 19:46:51 +0900
From:	Hidetoshi Seto <seto.hidetoshi@...fujitsu.com>
To:	Andi Kleen <andi@...stfloor.org>
CC:	linux-kernel@...r.kernel.org, mingo@...e.hu, hpa@...or.com,
	tglx@...utronix.de
Subject: Re: [PATCH] x86: mce: Xeon75xx specific interface to get corrected
 memory error information v2

(2010/03/29 18:01), Andi Kleen wrote:
>>>> Xeon 75xx doesn't log physical addresses on corrected machine check
>>>> events in the standard architectural MSRs. Instead the address has to
>>>> be retrieved in a model specific way. This makes it impossible
>>>> to do predictive failure analysis.
>>
>> Could you point proper specification or datasheet to know/check what
>> you are going to do here?
> 
> You mean how the model specific interface works?
> 
> There's currently no public specification for the interface,
> but it should be reasonably clear from reading the driver how
> it works.
> 
> -Andi

It looks like overengineered...

I have some questions: Is it impossible to get the address
after polling handler have processed?  e.g. Is it possible to
implement this module as mcelog's add-on that hooked & invoked
immediately after reading /dev/mcelog?  I guess there are
some limitation/restriction to call pfa_command().

Are there any alternative way to get the address?
Polling like edac_i7 doesn't help this?

You pointed "This makes it impossible to do predictive failure
analysis", but I guess we could do rough-but-enough analysis that
requires coarse resolution like sockets.  Or we should not expect
that one of DIMMs connected to the socket is broken if the socket
reports corrected memory errors many time?


Thanks,
H.Seto



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ