lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110321175110.GL1239@redhat.com>
Date:	Mon, 21 Mar 2011 13:51:10 -0400
From:	Don Zickus <dzickus@...hat.com>
To:	Cyrill Gorcunov <gorcunov@...il.com>
Cc:	Ingo Molnar <mingo@...e.hu>, Jack Steiner <steiner@....com>,
	tglx@...utronix.de, hpa@...or.com, x86@...nel.org,
	linux-kernel@...r.kernel.org,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Subject: Re: [PATCH] x86, UV: Fix NMI handler for UV platforms

On Mon, Mar 21, 2011 at 07:26:51PM +0300, Cyrill Gorcunov wrote:
> On 03/21/2011 07:14 PM, Ingo Molnar wrote:
> > 
> > * Jack Steiner <steiner@....com> wrote:
> > 
> >> This fixes a problem seen on UV systems handling NMIs from the node controller.
> >> The original code used the DIE notifier as the hook to get to the UV NMI
> >> handler. This does not work if performance counters are active - the hw_perf
> >> code consumes the NMI and the UV handler is not called.

Well that is a bug in the perf code.  We have been dealing with 'perf'
swallowing NMIs for a couple of releases now.  I think we got rid of most
of the cases (p4 and acme's core2 quad are the only cases I know that are
still an issue).

I would much prefer to investigate the reason why this is happening
because the perf nmi handler is supposed to check the global interrupt bit
to determine if the perf counters caused the nmi or not otherwise fall
through to other handler like SGI's nmi button in this case.

My first impression is the skip nmi logic in the perf handler is probably
accidentally thinking the SGI external nmi is the perf's 'extra' nmi it is
supposed to skip and thus swallows it.  At least that is the impression I
get from the RedHat bugzilla which says SGI is running 'perf top', getting
a hang, then pressing their nmi button to see the stack traces.

Jack,

I worked through a number of these issues upstream and I already talked to
George and Russ over here at RedHat about working through the issue over
here with them.  They can help me get access to your box to help debug.

Cheers,
Don
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ