linux-kernel - RE: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <04EAB7311EE43145B2D3536183D1A8445491F614@GSjpTKYDCembx31.service.hitachi.net>
Date:	Thu, 30 Jul 2015 07:33:15 +0000
From:	河合英宏 / KAWAI，HIDEHIRO 
	<hidehiro.kawai.ez@...achi.com>
To:	"'ltc-kernel@...yrl.intra.hitachi.co.jp'" 
	<ltc-kernel@...yrl.intra.hitachi.co.jp>,
	"'Michal Hocko'" <mhocko@...nel.org>
CC:	Jonathan Corbet <corbet@....net>,
	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...nel.org>,
	"Eric W. Biederman" <ebiederm@...ssion.com>,
	"H. Peter Anvin" <hpa@...or.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Vivek Goyal <vgoyal@...hat.com>,
	"linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>,
	"x86@...nel.org" <x86@...nel.org>,
	"kexec@...ts.infradead.org" <kexec@...ts.infradead.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Ingo Molnar <mingo@...hat.com>,
	平松雅巳 / HIRAMATU，MASAMI 
	<masami.hiramatsu.pt@...achi.com>
Subject: RE: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to
 panic on NMI

Hi Michal,

> From: 河合英宏 / KAWAI，HIDEHIRO [mailto:hidehiro.kawai.ez@...achi.com]
> > When I was testing my
> > previous approach (on 3.0 based kernel) I had basically the same thing
> > (one NMI to process panic) and others to return. This led to a strange
> > behavior when the NMI button triggered NMI on all (hundreds) CPUs.
> 
> It's strange.  Usually, NMI caused by NMI button is routed to only CPU 0
> as an external NMI.  External NMI for CPUs other than CPU 0 are masked
> at boot time.  Does it really happen?  Does the problem still happen on
> the latest kernel?  What kind of NMI is deliverd to each CPU?

Are you using SGI UV?  On that platform, NMIs may be delivered to
all cpus because LVT1 of all cpus are not masked as follows:

void uv_nmi_init(void)
{
        unsigned int value;

        /*
         * Unmask NMI on all cpus
         */
        value = apic_read(APIC_LVT1) | APIC_DM_NMI;
        value &= ~APIC_LVT_MASKED;
        apic_write(APIC_LVT1, value);
}

> 
> Traditionally, we should have assumed that NMI for crash dumping is
> delivered to only one cpu.  Otherwise, we should often fail to take
> a proper crash dump.  It seems that your case is another problem to be
> solved separately.
> 
> > The
> > crash kernel booted eventually but the log contained lockups when a
> > CPU waited for an IPI to the CPU which was handling the NMI panic.
> 
> Could you explain more precisely?
> 
> > Anyway, I do not thing this is really necessary to solve the panic
> > reentrancy issue.
> > If the missing saved state is a real problem then it
> > should be handled separately - maybe it can be achieved without an IPI
> > and directly from the panic context if we are in NMI.
> 
> What I would like to do via this patchse is to solve race issues
> among NMI, panic() and crash_kexec().  So, I don't think we should fix
> that separately, although I would need to reword some descriptions
> and titles.
> 
> Anyway, I'm going to sent out my revised version once in order to
> tidy up.  I also would like to hear kexec/kdump guys' opinions.
> 
> Regards,
> Kawai