[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130923153357.GB9326@twins.programming.kicks-ass.net>
Date: Mon, 23 Sep 2013 17:33:57 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: eranian@...il.com
Cc: Ingo Molnar <mingo@...nel.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Arnaldo Carvalho de Melo <acme@...radead.org>,
Thomas Gleixner <tglx@...utronix.de>,
Andi Kleen <andi@...stfloor.org>
Subject: Re: PEBS bug on HSW: "Unexpected number of pebs records 10" (was:
Re: [GIT PULL] perf changes for v3.12)
On Mon, Sep 23, 2013 at 05:25:19PM +0200, Stephane Eranian wrote:
> > Its not just a broken threshold. When a PEBS event happens it can re-arm
> > itself but only if you program a RESET value !0. We don't do that, so
> > each counter should only ever fire once.
> >
> > We must do this because PEBS is broken on NHM+ in that the
> > pebs_record::status is a direct copy of the overflow status field at
> > time of the assist and if you use the RESET thing nothing will clear the
> > status bits and you cannot demux the PEBS events back to the event that
> > generated them.
> >
> Trying to understand this problem better. You are saying that in case you
> are sampling multiple PEBS events there is a problem if you allow more
> than one record per PEBS buffer because the overflow status is not reset
> properly.
That is what I wrote; but I'm not entire sure that's correct. I think it
will reset the overflow bits once it does an actual reset after the PEBS
assist triggers, but see below.
> For instance, if first record is caused by counter 0, ovfl_status=0x1,
> then counter
> is reset. Then, if counter 1 is the cause of the next record, then
> that record has the
> ovfl_status=0x3 instead of ovfl_status=0x2? Is that what you are saying?
>
> If so then yes, I agree this is a serious bug and we need to have Intel fix it.
But there's still the case where with 2 counters you can get:
cnt0 overflows; sets status |= 1 << 0, arms PEBS0 assist
cnt1 overflows; sets status |= 1 << 1, arms PEBS1 assist
PEBS0 ready to trigger
PEBS1 ready to trigger
Cnt1 event -> PEBS1 trigger, writes entry with status := 0x03
Cnt0 event -> PEBS0 trigger, writes entry with status := 0x03
At which point you'll have 2 events with the same status overflow bits
in 'reverse' order.
If we'd set RESET, the second entry would have status : 0x01, which
would be unambiguous again. But we'd still not know where to place the
0x03 entry.
With more PEBSn counters enabled and a threshold > 1 the chance of
having such scenarios is greatly increased.
The threshold := 1 case tries to avoid these cases by getting them out
as fast as possible and hopefully avoiding the second trigger.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists