lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150409085944.GA27042@hori1.linux.bs1.fc.nec.co.jp>
Date:	Thu, 9 Apr 2015 08:59:44 +0000
From:	Naoya Horiguchi <n-horiguchi@...jp.nec.com>
To:	Borislav Petkov <bp@...en8.de>
CC:	Ingo Molnar <mingo@...nel.org>, Tony Luck <tony.luck@...el.com>,
	"Prarit Bhargava" <prarit@...hat.com>,
	Vivek Goyal <vgoyal@...hat.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Junichi Nomura <j-nomura@...jp.nec.com>,
	Kiyoshi Ueda <k-ueda@...jp.nec.com>
Subject: Re: [PATCH v8] x86: mce: kexec: switch MCE handler for kexec/kdump

On Thu, Apr 09, 2015 at 10:21:25AM +0200, Borislav Petkov wrote:
> On Thu, Apr 09, 2015 at 10:00:30AM +0200, Ingo Molnar wrote:
> > So the thing is, when we boot up the second kernel there will be a 
> > window where the old handler isn't valid (because the new kernel has 
> > its own pagetables, etc.) and the new handler is not installed yet.
> > 
> > If an MCE hits that window, it's bad luck. (unless the bootup sequence 
> > is rearchitected significantly to allow cross-kernel inheritance of 
> > MCE handlers.)
> > 
> > So I think we can ignore _that_ race.
> 
> Yah, that's the "tough luck" race.
> 
> > The more significant question is: what happens when an MCE arrives 
> > whiel the kdump is proceeding - as kdumps can take a long time to 
> > finish when there's a lot of RAM.
> 
> We say that the dump might be unreliable.
> 
> > But ... since the 'shootdown' is analogous to a CPU hotplug CPU-down 
> > sequence, I suppose that the existing MCE code should already properly 
> > handle the case where an MCE arrives on a (supposedly) dead CPU, 
> > right? In that case installing a separate MCE handler looks like the 
> > wrong thing.
> 
> Hmm, so mce_start() does look only on the online CPUs. So if crash does
> maintain those masks correctly...
> 
> > So I don't like this principle either: 'our current code is a mess 
> > that might not work, add new one'.
> 
> Well, we can try to simplify it in the sense that those assumptions like
> mcelog and other MCE consuming crap and notifier chain are tested for
> their presence before using them...
> 
> I'd be open for this if we have a way to test this kdump scenario. For
> now, not even qemu can do that.

I replied about testing.
That might be tricky a little, but I hope it helps.

> > Looks like that's the real problem. How about the kdump crash dumper 
> > sets it back to 'ignore' again when we crash, and also double check 
> > how we handle various corner cases?
> 
> I think I even suggested that at some point. Or was it to increase the
> tolerance level. So Naoya, what's wrong with this again? I forgot.

Even if we raise tolerant level in running kdump, that doesn't prevent
idling CPUs from running MCE handlers when MCE arrives, which makes memory
accesses (losing information from kdump's viewpoint) and spits
"MCE synchronization timeout" messages (unclear and confusing for users.)
And it also leaves a potential risk of being broken again when do_machine_check()
changes in the future (which maybe come from sharing code to handle different
situations.)

So raising tolerance is OK as a "minimum change" approach, but it has
above downsides to be traded off.

Thanks,
Naoya Horiguchi

> Because this would be the simplest. Simply set tolerance level to 3 and
> dump away...--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ