linux-kernel - Re: [tip:perf/urgent] perf, x86: Catch spurious interrupts after disabling counters

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20100929200323.GC26290@redhat.com>
Date:	Wed, 29 Sep 2010 16:03:23 -0400
From:	Don Zickus <dzickus@...hat.com>
To:	Stephane Eranian <eranian@...gle.com>
Cc:	Robert Richter <robert.richter@....com>,
	Cyrill Gorcunov <gorcunov@...il.com>,
	"mingo@...hat.com" <mingo@...hat.com>,
	"hpa@...or.com" <hpa@...or.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"yinghai@...nel.org" <yinghai@...nel.org>,
	"andi@...stfloor.org" <andi@...stfloor.org>,
	"peterz@...radead.org" <peterz@...radead.org>,
	"ying.huang@...el.com" <ying.huang@...el.com>,
	"fweisbec@...il.com" <fweisbec@...il.com>,
	"ming.m.lin@...el.com" <ming.m.lin@...el.com>,
	"tglx@...utronix.de" <tglx@...utronix.de>,
	"mingo@...e.hu" <mingo@...e.hu>
Subject: Re: [tip:perf/urgent] perf, x86: Catch spurious interrupts after
 disabling counters

On Wed, Sep 29, 2010 at 09:42:26PM +0200, Stephane Eranian wrote:
> On Wed, Sep 29, 2010 at 8:12 PM, Don Zickus <dzickus@...hat.com> wrote:
> > Robert,
> >
> > I think you missed Stephane's point.  Say for example, kgdb is being used
> > while we are doing stuff with the perf counter (and say kgdb's handler is
> > a lower priority than perf; which isn't true I know, but let's say):
> >
> Yes, exactly my point. The reality is you cannot afford to have false positive
> because you may starve another subsystem from an important notification.
> 
> I think it boils down to whether or not we need an error message (Dazed) in
> case no subsystem claimed the NMI. If you were to just silently consume the
> NMI when no subsystem claims it, then you would not have these issues.
> 
> What Don has done is use a heuristic which gets activated when a PMU
> interrupt handler signals that more than one counter have overflowed. His
> claim is that this situation is likely to trigger back-to-back.

Actually its Robert's heuristic. :-)

> 
> The reason this heuristic works is because it waits until ALL the subsystems
> have seen the notification before it declares that the NMI was PMU spurious.
> To do that is uses the DIE_NMI_UNKNOWN callchain. Handler on this chain
> get call last, after all subsystems have seen the notification once. I believe
> that is the only way to safely "consume" a "spurious" NMI and avoid
> the 'Dazed' message. Anything else runs the risks of starving the other
> subsystems.

I agree.

Cheers,
Don
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/