linux-kernel - RE: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <04EAB7311EE43145B2D3536183D1A8445499D30D@GSjpTKYDCembx31.service.hitachi.net>
Date:	Thu, 1 Oct 2015 07:01:50 +0000
From:	河合英宏 / KAWAI，HIDEHIRO 
	<hidehiro.kawai.ez@...achi.com>
To:	"'Peter Zijlstra'" <peterz@...radead.org>
CC:	Jonathan Corbet <corbet@....net>, Ingo Molnar <mingo@...nel.org>,
	"Eric W. Biederman" <ebiederm@...ssion.com>,
	"H. Peter Anvin" <hpa@...or.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Vivek Goyal <vgoyal@...hat.com>,
	"linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>,
	"x86@...nel.org" <x86@...nel.org>,
	"kexec@...ts.infradead.org" <kexec@...ts.infradead.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Michal Hocko <mhocko@...nel.org>,
	Ingo Molnar <mingo@...hat.com>,
	平松雅巳 / HIRAMATU，MASAMI 
	<masami.hiramatsu.pt@...achi.com>
Subject: RE: [V4 PATCH 4/4] x86/apic: Introduce noextnmi boot option

> On Thu, Oct 01, 2015 at 02:33:18AM +0000, 河合英宏 / KAWAI，HIDEHIRO wrote:
> > > On Fri, Sep 25, 2015 at 08:28:11PM +0900, Hidehiro Kawai wrote:
> > > > This patch introduces new boot option "noextnmi" which disables
> > > > external NMI.  This option is useful for the dump capture kernel
> > > > so that an HA application or administrator wouldn't mistakenly
> > > > shoot down the kernel by NMI.
> > >
> > > So that they can get really stuck when the crash kernel crashes, right?
> > > ;-)
> >
> > No, it is different from my intention.
> >
> > `mistakenly' in the above means; they issue NMI due to a misconception
> > that the monitored host is stuck in the 1st kernel while it is actually
> > taking a crash dump in the 2nd kernel.  To avoid this kind of accident,
> > there is a tool such as fence_kdump which notifies "I'm taking a crash
> > dump, so don't send NMI" to the HA clustering software.  However, there
> > is a time window between kernel panic and the notification.
> >
> > "noextnmi" allows users to avoid this kind of accident all the time of
> > 2nd kernel.
> 
> Yes yes, I understand. But if the crash kernel also gets stuck they have
> no means of recovery, right? (other than power cycling the hardware)

Yes, but I think it's not a big problem.

I suppose that a sever which uses this feature will equip a BMC
and BMC mandatorily supports hard reset command for the server.
If the HA clustering software detects no response from the server
after relatively long timeout, it might want to insert hard reset
to the server by IPMI over LAN.

> Just playing devils advocate here, I don't actually object to the patch.

Regards,

Hidehiro Kawai
Hitachi, Ltd. Research & Development Group