lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Fri, 3 Sep 2010 13:02:49 +0200 From: Stephane Eranian <eranian@...gle.com> To: Peter Zijlstra <peterz@...radead.org> Cc: Don Zickus <dzickus@...hat.com>, Robert Richter <robert.richter@....com>, "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, "mingo@...e.hu" <mingo@...e.hu> Subject: Re: [PATCH 4/4] [x86] perf: fix accidentally ack'ing a second event on intel perf counter On Fri, Sep 3, 2010 at 10:33 AM, Peter Zijlstra <peterz@...radead.org> wrote: > On Thu, 2010-09-02 at 16:39 +0200, Stephane Eranian wrote: >> I managed to reproduce on core i7 860 (without patch4). >> Looking at the code again, I am dubious you ever execute >> the retry goto. If the PMU is disabled and you've just >> cleared the OVF_STAT, then I don't see where the new >> overflows would come from. But that's a separate problem. >> >> One thing I did is to compare status obtained via OVFL_STATUS >> with one that I build manually by inspecting each individual >> counter. The two returned bitmasks should always be identical >> (with PEBS disabled). When I got the spurious NMI, it did not >> trip my status validation. So the OVFL_STATUS is valid. >> >> I found something else that looked fishy. I am experimenting >> with it. I will report back. > > One thing we still need to do is on init detect if the BIOS is using one > of the PMCs and simply disable all of perf and print a nice big message > to the user to request a new BIOS from their vendor. > Given then way perf_events operate, that is your only choice at this point. But I am sure neither my system nor yours is subject to this particular issue yet there is some unexplained errors with OVF_STATUS. Here is an example of what I gathered on a Westmere: This is coming into the interrupt handler: - status = overflow status coming from GLOBAL_OVF_STATUS - status2 = inspection of the counters - act = cpuc->active_mask[0] In case both status don't match, I dump the state of the active events incl. the counter values(val). [ 822.813808] CPU2 irqin status=0x6 status2=0x4 act=0x7 [ 822.813818] CPU2 cfg=0x13003c idx=0 sel=53003c val=ffffa833f298 [ 822.813821] CPU2 cfg=0x12003c idx=1 sel=52003c val=fffffe130229 [ 822.813823] CPU2 cfg=0x11003c idx=2 sel=51003c val=5e9 Here only counter2 has overflowed, yet the handler will also process counter1 which is wrong. The other thing I noticed is that in intel_pmu_disable_event(), the event stopped sometimes has overflowed. Looks like OVF_STAUS is stale. Maybe OVF_STATUS is not cleared properly somewhere, possibly when an event gets disabled. I have a busy system, with the NMI watchdog running (0x13003c) where I do: perf record -a -C 1 -e cycles:k -ecycles:u -F 10 -- sleep 10 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists