[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20161107185524.GA2536@intel.com>
Date: Mon, 7 Nov 2016 10:55:24 -0800
From: "Luck, Tony" <tony.luck@...el.com>
To: Borislav Petkov <bp@...en8.de>
Cc: Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
linux-kernel@...r.kernel.org, rt@...utronix.de,
linux-edac@...r.kernel.org, x86@...nel.org,
Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH 22/25] x86/mcheck: Do the init in one place
On Mon, Nov 07, 2016 at 07:45:32PM +0100, Borislav Petkov wrote:
> On Thu, Nov 03, 2016 at 03:50:18PM +0100, Sebastian Andrzej Siewior wrote:
> > Part of the init (memory allocation and so on) is done
> > in mcheck_cpu_init(). While moving the the allocation to
> > mcheck_init_device() (where the hotplug calls are initialized) it
> > becomes necessary to move the callback (mcheck_cpu_init()), too.
> >
> > The callback is now removed from identify_cpu() and registered as a
> > hotplug event which is invoked as the very first one which is shortly
> > after the original point of invocation (look at smp_store_cpu_info() and
> > notify_cpu_starting() in smp_callin()).
> > One "visible" difference is that MCE for the boot CPU is not enabled at
> > identify_boot_cpu() time but at device_initcall_sync() time. Either way,
> > both times we had no userland around.
>
> Uh, hm, I'm not sure about this: so the issue I see with this is that
> the more we're delaying the enabling or MCE reporting - and especially
> setting CR4[MCE] - the more we're increasing the window where a MCE
> during early boot will cause a shutdown. (This is what happens if
> CR4[MCE]=0b).
>
> Perhaps we should split the init into a very early init which doesn't
> need to be part of hotplug and the rest, which can do mce_disable_cpu()
> and mce_reenable_cpu().
>
> Tony, how do you see this?
I don't think that helps as much as you'd like it to help (at
least on Intel). A broadcast machine check that finds the boot
CPU has set CR4[MCE]=1 is still going to end up in reset if any
other CPU still has CR4[MCE]=0
-Tony
Powered by blists - more mailing lists