lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4F4E22B1.6020505@redhat.com>
Date:	Wed, 29 Feb 2012 10:05:53 -0300
From:	Mauro Carvalho Chehab <mchehab@...hat.com>
To:	Borislav Petkov <bp@...64.org>
CC:	Hidetoshi Seto <seto.hidetoshi@...fujitsu.com>,
	Tony Luck <tony.luck@...el.com>, Ingo Molnar <mingo@...e.hu>,
	EDAC devel <linux-edac@...r.kernel.org>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 1/3] mce: Add a msg string to the MCE tracepoint

Em 29-02-2012 09:19, Borislav Petkov escreveu:
> On Wed, Feb 29, 2012 at 09:04:46AM -0300, Mauro Carvalho Chehab wrote:
>> Not all information is packed in the record. The record packs only what it
>> is inside the MCE registers. However, for certain errors, it is needed to
>> parse other hardware registers to decode the error (for example, on Sandy
>> Bridge, the MCE registers don't contain the affected dimms).
> 
> If SB is not using MCA to report the error, it should use either a
> generic TP like the trace_hw_error() example I gave last week, or rather
> a TP which matches the hw registers of the reporting hardware scheme.

This is not what I said. On intel, both SB and Nehalem use MCA to report errors.
Older chipsets don't use MCA.

However, there's a fundamental difference between SB and Nehalem:

- on Nehalem, the MCE status register encodes not only the error message; it
  also encodes the DIMM that generated the  error. So, it is possible to 
  completely decode the error on userspace, using only the MCE registers.

- on SB, the MCE status register only has the error message. In order to get
  the DIMM location, the driver needs to parse the registers that describe
  how the DIMM's are organized (this is spread on dozens of PCI devices, and
  200+ registers), and how they're interlaced, in order to convert the error 
  address reported by the MCA into a DIMM location.

So, just storing the values of the MCE registers is not enough to completely
decode the error.

Regards,
Mauro


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ