linux-kernel - Re: [PATCH 4/4] [x86] perf: fix accidentally ack'ing a second event on intel perf counter

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1283512292.1783.350.camel@laptop>
Date:	Fri, 03 Sep 2010 13:11:32 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	Stephane Eranian <eranian@...gle.com>
Cc:	Don Zickus <dzickus@...hat.com>,
	Robert Richter <robert.richter@....com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"mingo@...e.hu" <mingo@...e.hu>
Subject: Re: [PATCH 4/4] [x86] perf: fix accidentally ack'ing a second
 event on intel perf counter

On Fri, 2010-09-03 at 13:02 +0200, Stephane Eranian wrote:
> 
> > One thing we still need to do is on init detect if the BIOS is using one
> > of the PMCs and simply disable all of perf and print a nice big message
> > to the user to request a new BIOS from their vendor.
> >
> Given then way perf_events operate, that is your only choice at this point.

Well, it wouldn't be too hard to cure that, but the BIOS should simply
keep its grubby paws of the PMU -- I'm really not interested in
co-operating on that point.

> But I am sure neither my system nor yours is subject to this particular issue

Sure, worth checking though, not sure Don did on his machine.

> yet there is some unexplained errors with OVF_STATUS.

Right.

> Here is an example of what I gathered on a Westmere:
> 
> This is coming into the interrupt handler:
> - status   = overflow status coming from GLOBAL_OVF_STATUS
> - status2 = inspection of the counters
> - act = cpuc->active_mask[0]
> 
> In case both status don't match, I dump the state of the active events
> incl. the counter values(val).
> 
> [  822.813808] CPU2 irqin status=0x6 status2=0x4 act=0x7
> [  822.813818] CPU2 cfg=0x13003c idx=0 sel=53003c val=ffffa833f298
> [  822.813821] CPU2 cfg=0x12003c idx=1 sel=52003c val=fffffe130229
> [  822.813823] CPU2 cfg=0x11003c idx=2 sel=51003c val=5e9
> 
> Here only counter2 has overflowed, yet the handler will also process counter1
> which is wrong.

Right, we could easily revert to scanning all counters like we do for
all other interrupt handlers.

> The other thing I noticed is that in intel_pmu_disable_event(), the event
> stopped sometimes has overflowed. Looks like OVF_STAUS is stale.
> Maybe OVF_STATUS is not cleared properly somewhere, possibly when
> an event gets disabled.

Right, the code pretty much assumes that if it overflows a PMI will be
generated. So you're saying a pending PMI might get canceled when we
clear the EN bit? Most icky.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/