lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1404752411.4484.25.camel@oc3432500282.ibm.com>
Date:	Mon, 07 Jul 2014 10:00:11 -0700
From:	Max Asbock <masbock@...ux.vnet.ibm.com>
To:	Borislav Petkov <bp@...en8.de>
Cc:	linux-edac <linux-edac@...r.kernel.org>,
	Tony Luck <tony.luck@...el.com>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH -v3 3/4] MCE, CE: Wire in the CE collector

On Tue, 2014-07-01 at 21:23 +0200, Borislav Petkov wrote:
> From: Borislav Petkov <bp@...e.de>
> 
> Add the CE collector to the polling path which collects the correctable
> errors. Collect only DRAM ECC errors for now.
> 
> Signed-off-by: Borislav Petkov <bp@...e.de>
> ---
>  arch/x86/kernel/cpu/mcheck/mce.c | 84 ++++++++++++++++++++++++++++++++++++----
>  1 file changed, 76 insertions(+), 8 deletions(-)
> 

> +
> +static void __log_ce(struct mce *m, enum mcp_flags flags)
> +{
> +	/*
> +	 * Don't get the IP here because it's unlikely to have anything to do
> +	 * with the actual error location.
> +	 */

The above comment doesn't belong here. This function is about how to
dispose of corrected errors, not what data should be collected.
(Besides this comment is obsolete as the IP is always collected in
mce_gather_info()).

> +	if ((flags & MCP_DONTLOG) || mca_cfg.dont_log_ce)
> +		return;
> +

> +	if (dram_ce_error(m)) {
> +		/*
> +		 * In the cases where we don't have a valid address after all,
> +		 * do not collect but log.
> +		 */
> +		if (!(m->status & MCI_STATUS_ADDRV))
> +			goto log;
> +
> +		mce_ring_add(&__get_cpu_var(ce_ring), m->addr >> PAGE_SHIFT);
> +		return;
> +	}
> +
> +log:
> +		mce_log(m);
> +}
The above code is a bit convoluted, it amounts to:

if (we have a corrected dram error && we have an address for it)
	mce_ring_add()
else
	mcelog()

Is that the intention? This might be problematic for downstream
consumers of the errors such as the EDAC drivers which keep counts of
errors. If errors are silently removed from the stream these counts will
be bogus. Somebody might wonder why a page was off-lined while the EDAC
driver reports zero corrected DRAM error counts.

- Max


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ