[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130910142942.GB8388@gmail.com>
Date: Tue, 10 Sep 2013 16:29:43 +0200
From: Ingo Molnar <mingo@...nel.org>
To: eranian@...il.com
Cc: Peter Zijlstra <peterz@...radead.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Arnaldo Carvalho de Melo <acme@...radead.org>,
Thomas Gleixner <tglx@...utronix.de>,
Andi Kleen <andi@...stfloor.org>
Subject: Re: PEBS bug on HSW: "Unexpected number of pebs records 10" (was:
Re: [GIT PULL] perf changes for v3.12)
* Stephane Eranian <eranian@...glemail.com> wrote:
> On Tue, Sep 10, 2013 at 6:38 AM, Ingo Molnar <mingo@...nel.org> wrote:
> >
> > * Stephane Eranian <eranian@...glemail.com> wrote:
> >
> >> Hi,
> >>
> >> Ok, so I am able to reproduce the problem using a simpler
> >> test case with a simple multithreaded program where
> >> #threads >> #CPUs.
> >
> > Does it go away if you use 'perf record --all-cpus'?
> >
> Haven't tried that yet.
>
> But I verified the DS pointers:
> init:
> CPU6 pebs base=ffff8808262de000 index=ffff8808262de000
> intr=ffff8808262de0c0 max=ffff8808262defc0
> crash:
> CPU6 pebs base=ffff8808262de000 index=ffff8808262de9c0
> intr=ffff8808262de0c0 max=ffff8808262defc0
>
> Neither the base nor the max are modified.
> The index simply goes beyond the threshold but that's not a bug.
> It is 12 after the threshold of 1, so total 13 is my new crash report.
>
> Two things to try:
> - measure only one thread/core
> - move the threshold a bit farther away (to get 2 or 3 entries)
>
> The threshold is where to generate the interrupt. It does not mean where
> to stop PEBS recording. So it is possible that in HSW, we may get into a
> situation where it takes time to get to the handler to stop the PMU. I
> don't know how given we use NMI. Well, unless we were already servicing
> an NMI at the time. But given that we stop the PMU almost immediately in
> the handler, I don't see how that would possible. The other oddity in
> HSW is that we clear the NMI on entry to the handler and not at the end.
> I never gotten an good explanation as to why that was necessary. So
> maybe it is related...
Do you mean:
if (!x86_pmu.late_ack)
apic_write(APIC_LVTPC, APIC_DM_NMI);
AFAICS that means the opposite: that we clear the NMI late, i.e. shortly
before return, after we've processed the PMU.
Do the symptoms change if you remove the x86_pmu.late_ack setting line
from:
case 60: /* Haswell Client */
case 70:
case 71:
case 63:
case 69:
x86_pmu.late_ack = true;
?
Thanks,
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists