lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 25 Apr 2012 15:44:33 -0300
From:	Mauro Carvalho Chehab <mchehab@...hat.com>
To:	"Luck, Tony" <tony.luck@...el.com>
CC:	Borislav Petkov <bp@...64.org>,
	Linux Edac Mailing List <linux-edac@...r.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Doug Thompson <norsk5@...oo.com>
Subject: Re: [EDAC PATCH v13 6/7] edac.h: Prepare to handle with generic layers

Em 25-04-2012 15:32, Luck, Tony escreveu:
>> See the driver: the only useful information provided by the MCA log is
>> that an error happened, their physical address, and the type of the 
>> error. Unlikely the Nehalem MCA, the MCE_MISC registers won't point to the
>> DIMM in the error.
> 
> There's a bit more information in the MCA log than just the physical address:
> 
> The cpu number that finds the data in its bank will provide socket information.
> [/proc/cpuinfo maps logical cpu numbers to "physical id"]

Yes, but this seems to be different than the CPU that actually has the memory
controller. The MCA registers have a bit to mark if the the error is at the
same CPU or on another one. So, when there's just 2 CPU (sockets), this could
be used, but, for more than 2 CPUs, this field is useless.

So, I opted to not trust on it.

> Low order bits of the MCi_STATUS register will give the channel. See the SDM.

On all tests I did, the channel information reported via MCi_status didn't
match the channel reported via the decoding logic. Maybe this might be due
to some bug on the pre-release CPUs I used so far.

> So the only missing information from the MCA log is which DIMM within
> the channel.  I.e. we can pin the fault to a group of either two or
> three DIMMs depending on how many DIMMS/channel the motherboard supports.
> 
> If you only have one DIMM per channel populated than socket/channel is
> sufficient to identify the DIMM.
> 
> [We also don't have any intra-DIMM information for those customers who
> would like to diagnose the device on the DIMM, or which bits within
> the cache line had the error]
> 
> -Tony
> --
> To unsubscribe from this list: send the line "unsubscribe linux-edac" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ