lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150504154652.GF3829@pd.tnic>
Date:	Mon, 4 May 2015 17:46:52 +0200
From:	Borislav Petkov <bp@...en8.de>
To:	Aravind Gopalakrishnan <aravind.gopalakrishnan@....com>
Cc:	tglx@...utronix.de, mingo@...hat.com, hpa@...or.com,
	tony.luck@...el.com, jiang.liu@...ux.intel.com, yinghai@...nel.org,
	x86@...nel.org, dvlasenk@...hat.com, JBeulich@...e.com,
	slaoub@...il.com, luto@...capital.net, dave.hansen@...ux.intel.com,
	oleg@...hat.com, rostedt@...dmis.org, rusty@...tcorp.com.au,
	prarit@...hat.com, linux@...musvillemoes.dk, jroedel@...e.de,
	andriy.shevchenko@...ux.intel.com, macro@...ux-mips.org,
	wangnan0@...wei.com, linux-kernel@...r.kernel.org,
	linux-edac@...r.kernel.org, Robert Richter <rric@...nel.org>
Subject: Re: [PATCH 2/4] x86/mce/amd: Introduce deferred error interrupt
 handler

On Mon, May 04, 2015 at 10:29:50AM -0500, Aravind Gopalakrishnan wrote:
> >What's the family check for? for BIOSes which don't set the LVT offset
> >to 2, as they should?
> >
> >If so, we probably should say
> >
> >	pr_err(FW_BUG "Your BIOS is not setting up LVT offset 0x2 for deferred error IRQs correctly.\n");
> >
> >or similar...
> 
> Yeah. I meant to provide a comment at least for this.
> Forgot to do that.
> 
> I'll print out a error message as you suggested (considering we do this in
> other places like threshold setup or IBS setup..)

lvt_off_valid() does that already. Adding Robert.

> Right. I think a __log_error() is a good idea.
> Except, in amd_threshold_interrupt(), we have-
> m.misc = ((u64)high << 32) | low;
> 
> which, is actually useless as we don't use m.misc anywhere in
> amd_decode_mce() or anywhere else in the decoding pipeline AFAICT.
> We only print out if 'misc' is valid and we only need status bits for that-
> ((m->status & MCI_STATUS_MISCV) ? "MiscV" : "-"),
> 
> But, more importantly, we don't setup 'm.addr' here (in
> amd_threshold_interrupt() or in amd_deferred_error_interrupt())
> Which means anytime we pass an error to be decoded from the interrupt
> handlers, we don't get any info about the error address.

So what are we reporting with a deferred error if it is not a
full-fledged MCE? We better fix that otherwise we probably shouldn't
even report those. I mean, userspace is supposed to do some error
handling based on error info but if that info's missing, we might just
as well panic right then and there, right?

> So, we can do one of these-
> 1. Remove m.misc setup in amd_threshold_interrupt() and
> rdmsrl(MSR_IA32_MCx_ADDR(bank), m.addr) before we call mce_log()
> 2. Since we have mce_read_aux() that reads misc and addr registers, we can
> move the mce_[rd|wr]msrl wrappers and mce_read_aux() into mce.h and use it
> here in mce_amd.c
> 
> Thoughts?

Makes sense but you need to first check though, which registers are
valid in the hw when a threshold/deferred error happens and collect
them. Only then we can do proper recovery.

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ