lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 18 Nov 2010 14:32:47 -0500
From:	Don Zickus <dzickus@...hat.com>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Jason Wessel <jason.wessel@...driver.com>,
	Ingo Molnar <mingo@...e.hu>,
	Robert Richter <robert.richter@....com>, ying.huang@...el.com,
	Andi Kleen <andi@...stfloor.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Frederic Weisbecker <fweisbec@...il.com>
Subject: Re: [V2 PATCH 0/6] x86, NMI: give NMI handler a face-lift

On Thu, Nov 18, 2010 at 02:17:12PM +0100, Peter Zijlstra wrote:
> On Thu, 2010-11-18 at 06:47 -0600, Jason Wessel wrote:
> > More specifically
> > when another subsystem injects an NMI event the perf NMI code returns
> > NOTIFY_STOP. 
> 
> Not unconditionally, right? We only do so when the previous NMI was from
> the PMU and nobody claimed this one (NOTIFY_STOP from DIE_NMIUNKNOWN).
> 
> Or are you hitting the other one, where !handled but pmu_nmi.handled >
> 1 ?

I think the problem with the virt stuff is that it emulates 0 to the
rdmsrl calls.  All platforms except perf_events_intel.c rely on checking
the high bit of the counter register to not be zero, otherwise the code
thinks it crossed zero and triggered an PMI.

The intel code is a litte smarter and relies on the interrupt logic and
thus doesn't have this problem (to clarify only core2 and later use this,
p4 and p6 use the old methods).

So the problem is when the nmi watchdog is enabled, the perf event is
'active' and thus tries to read the counter value.  Because it is always
zero, perf just assumes the counter overflowed and the NMI is his.

Not sure how to fix it yet, other than include the logic that detects we
are on a guest and disable perf??

On a side note I think I have a fix for the p4 problem but will probably
need Cyril to look at it.  Basically in, p4_pmu_clear_cccr_ovf() it is
using the high part of the cccr register to determine if the counter
overflowed, when it probably wants to use the low bits of the cccr
register and high bits of the event_base.

Cheers,
Don

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ