lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <531A36F7.6020101@zytor.com>
Date:	Fri, 07 Mar 2014 13:15:35 -0800
From:	"H. Peter Anvin" <hpa@...or.com>
To:	Don Zickus <dzickus@...hat.com>
CC:	LKML <linux-kernel@...r.kernel.org>, x86@...nel.org,
	vgoyal@...hat.com, ebiederm@...ssion.com
Subject: Re: [PATCH] x86: Skip latched NMIs on early boot in kdump

On 03/07/2014 11:39 AM, Don Zickus wrote:
> A customer generated an external NMI using their iLO to test kdump worked.
> Unfortunately, the machine hung.  Disabling the nmi_watchdog made things work.
> 
> I speculated the external NMI fired, caused the machine to panic (as expected)
> and the perf NMI from the watchdog came in and was latched.  My guess was this
> somehow caused the hang.
> 

... as any other unexpected exception would.

> 
> I also do not fully understand why the latched NMI is not happening immediately
> after the load idt call or why it comes after a page fault (the
> early_make_pgtable).  Further adding to my confusion is why the early printk
> magic didn't dump a stack as I believe I had that setup on my commandline.
> But I figured I would just report what I have observed.
> 

If the kdump is initiated from NMI context, I'm wondering if it might be
possible that we haven't actually executed an IRET until this one
happens, and the IRET re-enables NMI.

> My testing and debugging were based off a 3.10 kernel (RHEL-7) but has included
> Seiji's tracepoint cleanups to arch/x86/kernel/head_64.S|head64.c.  Not much
> has changed upstream here.  Also 3.14-rc4 still has the same hang.
> 
> Signed-off-by: Don Zickus <dzickus@...hat.com>

We really shouldn't be doing the fixup lookup for NMI, either.  Probably
it makes more sense to just IRET on NMI until we have the real interrupt
vectors set up, but it needs to be done a little earlier.

How does this patch work for you?

	-hpa


View attachment "diff" of type "text/plain" (1343 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ