linux-kernel - RE: [PATCH v3 4/9] ACPI, x86: Extended error log driver for x86 platform

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <3908561D78D1C84285E8C5FCA982C28F31D41E37@ORSMSX106.amr.corp.intel.com>
Date:	Fri, 18 Oct 2013 20:57:22 +0000
From:	"Luck, Tony" <tony.luck@...el.com>
To:	Borislav Petkov <bp@...en8.de>,
	"Naveen N. Rao" <naveen.n.rao@...ux.vnet.ibm.com>
CC:	"Chen, Gong" <gong.chen@...ux.intel.com>,
	"joe@...ches.com" <joe@...ches.com>,
	"m.chehab@...sung.com" <m.chehab@...sung.com>,
	"arozansk@...hat.com" <arozansk@...hat.com>,
	"linux-acpi@...r.kernel.org" <linux-acpi@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH v3 4/9] ACPI, x86: Extended error log driver for x86
 platform

> Hmm, that's a good question you raise: but the more important question
> is, do you guys - Gong and Tony - want to replace the logging we're
> already doing, i.e. mce_log() with extlog or not.

Long term ... I'd be happy to see mce_log() go away.  But we need to have
a robust, well tested replacement in place for some time before such a
move is up for discussion.

> Because if you want to replace the current logging you actually have to
> exit machine_check_poll() after having done mce_ext_err_print() so that
> the rest of the chain doesn't see the error.

Yes - double error reporting should be avoided.

> And, does mce_ext_err_print only report DRAM ECC errors or other error
> types too?

Our first platforms to implement this only do so for memory errors.  This
could change in the future (the UEFI appendix N error record has defined
sub-sections for lots of types of errors).

Currently EDAC hooked into the mce even notification chain provides a
return code to indicate whether it completely processed the error, or
whether to fall through to the rest of mce_log():

	if (ret == NOTIFY_STOP)
		return;

Having both EDAC and this new extended error log both registered on this
chain would probably not be helpful in most cases.  Not sure if we should
handle that with user education to not load both an EDAC and ext_log driver
or if there should be some enforcement.

> Btw, if we keep both, then we're going to have two tracepoints -
> trace_mce_record() in mce_log() and this one - issuing each a record for
> the same event. Which is not really what we want I'd say...

trace_mce_record() dumps the raw data from the machine check banks.
I think there may still be a case for having this.  Analysis tools that look at
this trace as well should be smart enough to connect the dots.

-Tony