linux-kernel - Re: [PATCH] x86, UV: Fix NMI handler for UV platforms

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20110322171118.GA6294@sgi.com>
Date:	Tue, 22 Mar 2011 12:11:18 -0500
From:	Jack Steiner <steiner@....com>
To:	Don Zickus <dzickus@...hat.com>
Cc:	Cyrill Gorcunov <gorcunov@...il.com>, Ingo Molnar <mingo@...e.hu>,
	tglx@...utronix.de, hpa@...or.com, x86@...nel.org,
	linux-kernel@...r.kernel.org,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Subject: Re: [PATCH] x86, UV: Fix NMI handler for UV platforms

On Mon, Mar 21, 2011 at 03:37:40PM -0400, Don Zickus wrote:
> On Mon, Mar 21, 2011 at 01:22:35PM -0500, Jack Steiner wrote:
> > On Mon, Mar 21, 2011 at 01:51:10PM -0400, Don Zickus wrote:
> > > On Mon, Mar 21, 2011 at 07:26:51PM +0300, Cyrill Gorcunov wrote:
> > > > On 03/21/2011 07:14 PM, Ingo Molnar wrote:
> > > > > 
> > > > > * Jack Steiner <steiner@....com> wrote:
> > > > > 
> > > > >> This fixes a problem seen on UV systems handling NMIs from the node controller.
> > > > >> The original code used the DIE notifier as the hook to get to the UV NMI
> > > > >> handler. This does not work if performance counters are active - the hw_perf
> > > > >> code consumes the NMI and the UV handler is not called.
> > > 
> > > Well that is a bug in the perf code.  We have been dealing with 'perf'
> > > swallowing NMIs for a couple of releases now.  I think we got rid of most
> > > of the cases (p4 and acme's core2 quad are the only cases I know that are
> > > still an issue).
> > > 
> > > I would much prefer to investigate the reason why this is happening
> > > because the perf nmi handler is supposed to check the global interrupt bit
> > > to determine if the perf counters caused the nmi or not otherwise fall
> > > through to other handler like SGI's nmi button in this case.
> > 
> > The patch that I posted is based on a RHEL6.1 patch that I'm running internally.
> > Unless something has very recently changed in the RH sources, the perf
> > NMI handler unconditionally returns NOTIFY_STOP if it handles an NMI.
> > If no NMI was handled, it returns NOTIFY_DONE. This sometimes works
> > and allows the platform generated NMI to be processed but if both NMI
> > sources trigger at about he same time, the lower priority event
> > will be lost.
> 
> Not necessarily, if both are triggered, you should still get _two_ NMIs.
> It may get processed in the wrong order but it should still get correctly
> processed.

How certain are you that multiple NMIs triggered at about the same time will
deliver discrete NMI events? I updated the patch so that I'm running with:

	- no special code in traps.c (I removed the traps.c code that was
	  in the patch I posted)
	- used die_notifier for calling the UV nmi handler
	- UV priority is higher than the hw_perf priority

Both hw_perf (perf top) & UV NMIs work correctly under light loads. However, if I
run for 10 - 15 minutes injecting UV NMIs at a rate of about 30/min, "perf top"
stops generating output. Strace shows that it continues to poll() but no data
is received.

While "perf top" is hung, if I inject an NMI into the system in a way that will NOT
be consumed by the UV nmi handler, "perf top" resumes output but will stop again after
a few minutes.


AFAICT, the UV nmi handler is not consuming extra NMI interrupts. I can't
rule out that I'm missing something but I don't see it.


Do you have any ideas or clues???


> 
> > 
> > The root cause of the problem is that architecturally, x86 does not
> > have a way to identifies the source(s) that cause an NMI. If multiple
> > events occur at about the same time, there is no way that I can see that the
> > OS can detect it.
> 
> There are registers we can check to see who owns trigger the NMI (at least
> for the perf code, the SGI code maybe not, which is why I set it to a
> lower priority to be a catch-all).
> 
> I'm not aware of the x86 architecture dropping NMIs, so they should all
> get processed.  It is just a matter of which subsystems get determine if
> they are the source of the NMI or not.
> 
> > 
> > > 
> > > My first impression is the skip nmi logic in the perf handler is probably
> > > accidentally thinking the SGI external nmi is the perf's 'extra' nmi it is
> > > supposed to skip and thus swallows it.  At least that is the impression I
> > 
> > Agree
> > 
> > 
> > > get from the RedHat bugzilla which says SGI is running 'perf top', getting
> > > a hang, then pressing their nmi button to see the stack traces.
> > > 
> > > Jack,
> > > 
> > > I worked through a number of these issues upstream and I already talked to
> > > George and Russ over here at RedHat about working through the issue over
> > > here with them.  They can help me get access to your box to help debug.
> > 
> > Russ is right down the hall.
> 
> Great!
> 
> Cheers,
> Don
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/