lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160607181109.GA23770@intel.com>
Date:	Tue, 7 Jun 2016 11:11:09 -0700
From:	"Luck, Tony" <tony.luck@...el.com>
To:	Borislav Petkov <bp@...en8.de>
Cc:	linux-edac <linux-edac@...r.kernel.org>,
	Yazen Ghannam <Yazen.Ghannam@....com>, X86 ML <x86@...nel.org>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH 1/4] RAS: Add a Corrected Errors Collector

On Tue, Jun 07, 2016 at 06:52:22PM +0200, Borislav Petkov wrote:
> +void mce_log(struct mce *m)
>  {
>  	unsigned next, entry;
>  
> +	if (!in_atomic() && memory_error(m) && mce_usable_address(m))
> +		if (!ce_add_elem(m->addr >> PAGE_SHIFT))
> +			return;
> +
>  	/* Emit the trace record: */
> -	trace_mce_record(mce);
> +	trace_mce_record(m);
>  
> -	if (!mce_gen_pool_add(mce))
> +	if (!mce_gen_pool_add(m))
>  		irq_work_queue(&mce_irq_work);

Is there a reason that we need to call the ce_add_elem() inline
here instead of having it just register on the mce_notifier chain?
This series just cleaned out all the /dev/mcelog special code from
here, and you are adding something back before the ink is dry on
that change.

I'm also strongly divided about whether this corrected error
handler should be allowed to preempt anything else even seeing
the error.

Argument for:
Lonely corrected errors are "No Big Deal"(TM). Just counting them
and moving on is a good thing.

Arguments against:
1) We may miss out on a one-time opportunity to get extra information
(from acpi_extlog.c).
2) I think this subverts our CMCI storm detection and mitigation code?


We could make the chain more caller friendly by adding a filter
argument so users could say "just tell me about memory errors"
(currently each of the EDAC drivers has inline code to do the same
as "memory_error(m) && mce_usable_address(m)")

-Tony

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ