linux-kernel - RE: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <04EAB7311EE43145B2D3536183D1A844549220E7@GSjpTKYDCembx31.service.hitachi.net>
Date:	Fri, 31 Jul 2015 11:23:00 +0000
From:	河合英宏 / KAWAI，HIDEHIRO 
	<hidehiro.kawai.ez@...achi.com>
To:	"'Michal Hocko'" <mhocko@...nel.org>
CC:	Jonathan Corbet <corbet@....net>,
	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...nel.org>,
	"Eric W. Biederman" <ebiederm@...ssion.com>,
	"H. Peter Anvin" <hpa@...or.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Vivek Goyal <vgoyal@...hat.com>,
	"linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>,
	"x86@...nel.org" <x86@...nel.org>,
	"kexec@...ts.infradead.org" <kexec@...ts.infradead.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Ingo Molnar <mingo@...hat.com>,
	平松雅巳 / HIRAMATU，MASAMI 
	<masami.hiramatsu.pt@...achi.com>
Subject: RE: Re: [V2 PATCH 1/3] x86/panic: Fix re-entrance problem due to
 panic on NMI

> From: Michal Hocko [mailto:mhocko@...nel.org]
> 
> On Thu 30-07-15 11:55:52, 河合英宏 / KAWAI，HIDEHIRO wrote:
> > > From: Michal Hocko [mailto:mhocko@...nel.org]
> [...]
> > > Could you point me to the code which does that, please? Maybe we are
> > > missing that in our 3.0 kernel. I was quite surprised to see this
> > > behavior as well.
> >
> > Please see the snippet below.
> >
> > void setup_local_APIC(void)
> > {
> > ...
> >         /*
> >          * only the BP should see the LINT1 NMI signal, obviously.
> >          */
> >         if (!cpu)
> >                 value = APIC_DM_NMI;
> >         else
> >                 value = APIC_DM_NMI | APIC_LVT_MASKED;
> >         if (!lapic_is_integrated())             /* 82489DX */
> >                 value |= APIC_LVT_LEVEL_TRIGGER;
> >         apic_write(APIC_LVT1, value);
> >
> >
> > LINT1 pins of cpus other than CPU 0 are masked here.
> > However, at least on some of Hitachi servers, NMI caused by NMI
> > button doesn't seem to be delivered through LINT1.  So, my `external NMI'
> > word may not be correct.
> 
> I am not familiar with details here but I can tell you that this
> particular code snippet is the same in our 3.0 based kernel so it seems
> that the HW is indeed doing something differently.

Yes, and it turned out my PATCH 3/3 doesn't work at all on some
hardware...

> > > You might still get a panic on hardlockup which will happen on all CPUs
> > > from the NMI context so we have to be able to handle panic in NMI on
> > > many CPUs.
> >
> > Do you say about the case of a kerne panic while other cpus locks up
> > in NMI context?  In that case, there is no way to do things needed by
> > kdump procedure including saving registeres...
> 
> I am saying that watchdog_overflow_callback might trigger on more CPUs
> and panic from NMI context as well. So this is not reduced to the NMI
> button sends NMI to more CPUs.

I understand.  So, I have to also modify watchdog_overflow_callback
to call nmi_panic().

> Why cannot the panic() context save all the registers if we are going to
> loop in NMI context? This would be imho preferable to returning from
> panic IMO.

I'm not saying we cannot save registers and do some cleanups in NMI
context.  I fell that it would introduce unneeded complexity.
Since watchdog_overflow_callback is defined as generic code,
we need to implement the preparation for kdump for other architectures.
I haven't checked which architectures support both nmi watchdog and
kdump, though.

Anyway, I came up with a simple solution for x86.  Waiting for the
timing of nmi_shootdown_cpus() in nmi_panic(), then invoke the
callback registered by nmi_shootdown_cpus().

> > > I can provide the full log but it is quite mangled. I guess the
> > > CPU130 was the only one allowed to proceed with the panic while others
> > > returned from the unknown NMI handling. It took a lot of time until
> > > CPU130 managed to boot the crash kernel with soft lockups and RCU stalls
> > > reports. CPU0 is most probably locked up waiting for CPU130 to
> > > acknowledge the IPI which will not happen apparently.
> >
> > There is a timeout of 1000ms in nmi_shootdown_cpus(), so I don't know
> > why CPU 130 waits so long.  I'll try to consider for a while.
> 
> Yes, I do not understand the timing here either and the fact that the
> log is a complete mess in the important parts doesn't help a wee bit.

I'm interested in where "kernel panic -not syncing: " is.
It may give us a clue.


Regards,
Kawai