linux-kernel - Re: PEBS bug on HSW: "Unexpected number of pebs records 10" (was: Re: [GIT PULL] perf changes for v3.12)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAMsRxfJ5HG+0AiooOUFh8TzvCoK3YcBFpeAF0eTzdkDm=wB84g@mail.gmail.com>
Date:	Tue, 10 Sep 2013 07:15:19 -0700
From:	Stephane Eranian <eranian@...glemail.com>
To:	Ingo Molnar <mingo@...nel.org>
Cc:	Peter Zijlstra <peterz@...radead.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Arnaldo Carvalho de Melo <acme@...radead.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Andi Kleen <andi@...stfloor.org>
Subject: Re: PEBS bug on HSW: "Unexpected number of pebs records 10" (was: Re:
 [GIT PULL] perf changes for v3.12)

On Tue, Sep 10, 2013 at 6:38 AM, Ingo Molnar <mingo@...nel.org> wrote:
>
> * Stephane Eranian <eranian@...glemail.com> wrote:
>
>> Hi,
>>
>> Ok, so I am able to reproduce the problem using a simpler
>> test case with a simple multithreaded program where
>> #threads >> #CPUs.
>
> Does it go away if you use 'perf record --all-cpus'?
>
Haven't tried that yet.

But I verified the DS pointers:
init:
CPU6 pebs base=ffff8808262de000 index=ffff8808262de000
intr=ffff8808262de0c0 max=ffff8808262defc0
crash:
CPU6 pebs base=ffff8808262de000 index=ffff8808262de9c0
intr=ffff8808262de0c0 max=ffff8808262defc0

Neither the base nor the max are modified.
The index simply goes beyond the threshold but that's not a bug.
It is 12 after the threshold of 1, so total 13 is my new crash report.

Two things to try:
- measure only one thread/core
- move the threshold a bit farther away (to get 2 or 3 entries)

The threshold is where to generate the interrupt. It does not mean where to stop
PEBS recording. So it is possible that in HSW, we may get into a situation where
it takes time to get to the handler to stop the PMU. I don't know how
given we use
NMI. Well, unless we were already servicing an NMI at the time. But
given that we
stop the PMU almost immediately in the handler, I don't see how that
would possible.
The other oddity in HSW is that we clear the NMI on entry to the
handler and not at
the end. I never gotten an good explanation as to why that was
necessary. So maybe
it is related...





>> [ 2229.021934] WARNING: CPU: 6 PID: 17496 at
>> arch/x86/kernel/cpu/perf_event_intel_ds.c:1003
>> intel_pmu_drain_pebs_hsw+0xa8/0xc0()
>> [ 2229.021936] Unexpected number of pebs records 21
>>
>> [ 2229.021966] Call Trace:
>> [ 2229.021967]  <NMI>  [<ffffffff8159dcd6>] dump_stack+0x46/0x58
>> [ 2229.021976]  [<ffffffff8108dfdc>] warn_slowpath_common+0x8c/0xc0
>> [ 2229.021979]  [<ffffffff8108e0c6>] warn_slowpath_fmt+0x46/0x50
>> [ 2229.021982]  [<ffffffff810646c8>] intel_pmu_drain_pebs_hsw+0xa8/0xc0
>> [ 2229.021986]  [<ffffffff810668f0>] intel_pmu_handle_irq+0x220/0x380
>> [ 2229.021991]  [<ffffffff810c1d35>] ? sched_clock_cpu+0xc5/0x120
>> [ 2229.021995]  [<ffffffff815a5a84>] perf_event_nmi_handler+0x34/0x60
>> [ 2229.021998]  [<ffffffff815a52b8>] nmi_handle.isra.3+0x88/0x180
>> [ 2229.022001]  [<ffffffff815a5490>] do_nmi+0xe0/0x330
>> [ 2229.022004]  [<ffffffff815a48f7>] end_repeat_nmi+0x1e/0x2e
>> [ 2229.022008]  [<ffffffff810652b3>] ? intel_pmu_pebs_enable_all+0x33/0x40
>> [ 2229.022011]  [<ffffffff810652b3>] ? intel_pmu_pebs_enable_all+0x33/0x40
>> [ 2229.022015]  [<ffffffff810652b3>] ? intel_pmu_pebs_enable_all+0x33/0x40
>> [ 2229.022016]  <<EOE>>  [<ffffffff810659f3>] intel_pmu_enable_all+0x23/0xa0
>> [ 2229.022021]  [<ffffffff8105ff84>] x86_pmu_enable+0x274/0x310
>> [ 2229.022025]  [<ffffffff81141927>] perf_pmu_enable+0x27/0x30
>> [ 2229.022029]  [<ffffffff81143219>] perf_event_context_sched_in+0x79/0xc0
>>
>> Could be a HW race whereby the PEBS of each HT threads get mixed up.
>
> Yes, that seems plausible and would explain why the overrun is usually a
> small integer. We set up the DS with PEBS_BUFFER_SIZE == 4096, so with a
> record size of 192 bytes on HSW we should get index values of 0-21.
>
> That fits within the indices range reported so far.
>
>> [...] I will add a couple more checks to verify that. The intr_thres
>> should not have changed. Yet looks like we have a sitation where the
>> index is way past the threshold.
>
> Btw., it would also be nice to add a check of ds->pebs_index against
> ds->pebs_absolute_maximum, to make sure the PEBS record index never goes
> outside the DS area. I.e. to protect against random corruption.
>
> Right now we do only half a check:
>
>         n = top - at;
>         if (n <= 0)
>                 return;
>
> this still allows an upwards overflow. We check x86_pmu.max_pebs_events
> but then let it continue:
>
>         WARN_ONCE(n > x86_pmu.max_pebs_events,
>                   "Unexpected number of pebs records %d\n", n);
>
>         return __intel_pmu_drain_pebs_nhm(iregs, at, top);
>
> Instead it should be something more robust, like:
>
>         if (WARN_ONCE(n > max ...)) {
>                 /* Drain the PEBS buffer: */
>                 ds->pebs_index = ds->pebs_buffer_base;
>                 return;
>         }
>
> Thanks,
>
>         Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/