[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMsRxfJeV6JYJ-jke863EgJA0sFo0sZUTX8a4X3RhKaCvc_UEw@mail.gmail.com>
Date: Wed, 15 Jul 2015 08:42:50 +0200
From: Stephane Eranian <eranian@...glemail.com>
To: Vince Weaver <vincent.weaver@...ne.edu>
Cc: Peter Zijlstra <peterz@...radead.org>,
LKML <linux-kernel@...r.kernel.org>,
Ingo Molnar <mingo@...hat.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
kan.liang@...el.com
Subject: Re: perf: fuzzer triggered warning in intel_pmu_drain_pebs_nhm()
On Fri, Jul 3, 2015 at 9:49 PM, Vince Weaver <vincent.weaver@...ne.edu> wrote:
> On Fri, 3 Jul 2015, Peter Zijlstra wrote:
>
>> That said, its far too warm and I might just not be making sense.
>
> you need to come visit Maine! Although I am not sure the cooler weather
> necessarily improves my kernel debugging skills.
>
> I managed to lock the machine (again this is with the patch applied).
>
I can reproduce the problem on my HSW running the fuzzer.
I can see why this could be happening if you are mixing PEBS and non PEBS events
in the bottom 4 counters. I suspect:
for (bit = 0; bit < x86_pmu.max_pebs_events; bit++) {
if ((counts[bit] == 0) && (error[bit] == 0))
continue;
This test is not correct when you have non-PEBS events mixed with PEBS
events and
they overflow at the same time. They will have counts[i] != 0 but
error[i] == 0, and thus
you fall thru the loop and hit the assert. Or it is something along those lines.
> [ 299.366027] ------------[ cut here ]------------
> [ 299.370985] WARNING: CPU: 2 PID: 8241 at arch/x86/kernel/cpu/perf_event_intel_ds.c:1198 intel_pmu_drain_pebs_nhm+0x283/0x2e0()
> [ 299.456929] CPU: 2 PID: 8241 Comm: perf_fuzzer Tainted: G W 4.1.0+ #164
> [ 299.465750] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
> [ 299.474274] ffffffff81a105a0 ffff88011ea85b10 ffffffff8169f823 0000000000000000
> [ 299.482864] 0000000000000000 ffff88011ea85b50 ffffffff8106ec8a ffff88011ea85ba0
> [ 299.491488] 0000000000000000 0000000000000001 ffff88011ea8bd80 ffff8801190400c0
> [ 299.500029] Call Trace:
> [ 299.503190] <NMI> [<ffffffff8169f823>] dump_stack+0x45/0x57
> [ 299.509936] [<ffffffff8106ec8a>] warn_slowpath_common+0x8a/0xc0
> [ 299.516901] [<ffffffff8106ed7a>] warn_slowpath_null+0x1a/0x20
> [ 299.523715] [<ffffffff8102f783>] intel_pmu_drain_pebs_nhm+0x283/0x2e0
> [ 299.531268] [<ffffffff81032235>] intel_pmu_handle_irq+0x255/0x440
> [ 299.538487] [<ffffffff81028e76>] perf_event_nmi_handler+0x26/0x40
> [ 299.545638] [<ffffffff810181ad>] nmi_handle+0x9d/0x140
> [ 299.551772] [<ffffffff81018115>] ? nmi_handle+0x5/0x140
> [ 299.558013] [<ffffffff8101843a>] default_do_nmi+0x4a/0x120
> [ 299.564527] [<ffffffff8101859d>] do_nmi+0x8d/0xc0
> [ 299.570185] [<ffffffff816a979f>] end_repeat_nmi+0x1e/0x2e
> [ 299.576580] [<ffffffff811bc9d2>] ? check_poison_obj+0x92/0x230
> [ 299.583390] [<ffffffff811bc9d2>] ? check_poison_obj+0x92/0x230
> [ 299.590163] [<ffffffff811bc9d2>] ? check_poison_obj+0x92/0x230
> [ 299.596922] <<EOE>> [<ffffffff8115bea8>] ? perf_event_alloc+0x58/0x680
> [ 299.604594] [<ffffffff811bcf7d>] cache_alloc_debugcheck_after.isra.51+0x1cd/0x250
> [ 299.613140] [<ffffffff811c08b6>] kmem_cache_alloc_trace+0xa6/0x510
> [ 299.620330] [<ffffffff8115bea8>] ? perf_event_alloc+0x58/0x680
> [ 299.627088] [<ffffffff8106ee48>] ? get_online_cpus+0x58/0x70
> [ 299.633688] [<ffffffff8115bea8>] perf_event_alloc+0x58/0x680
> [ 299.640319] [<ffffffff8115c897>] SYSC_perf_event_open+0x3c7/0xd40
> [ 299.647353] [<ffffffff8105f86b>] ? __do_page_fault+0x1ab/0x3f0
> [ 299.654172] [<ffffffff8115d689>] SyS_perf_event_open+0x9/0x10
> [ 299.660871] [<ffffffff816a7572>] entry_SYSCALL_64_fastpath+0x16/0x7a
> [ 299.668236] ---[ end trace 3356c74581c13f1d ]---
> [ 299.673648] Uhhuh. NMI received for unknown reason 31 on CPU 2.
> [ 299.680427] Do you have a strange power saving mode enabled?
> [ 299.686963] Dazed and confused, but trying to continue
> [ 299.692904] Uhhuh. NMI received for unknown reason 31 on CPU 2.
> [ 299.699748] Do you have a strange power saving mode enabled?
> [ 299.706227] Dazed and confused, but trying to continue
> [ 299.712172] Uhhuh. NMI received for unknown reason 31 on CPU 2.
> [ 299.718946] Do you have a strange power saving mode enabled?
> [ 299.725446] Dazed and confused, but trying to continue
> [ 299.731419] Uhhuh. NMI received for unknown reason 31 on CPU 2.
> [ 299.738235] Do you have a strange power saving mode enabled?
> [ 299.744740] Dazed and confused, but trying to continue
> [ 299.750660] Uhhuh. NMI received for unknown reason 21 on CPU 2.
> [ 299.757398] Do you have a strange power saving mode enabled?
> [ 299.763862] Dazed and confused, but trying to continue
>
> (machine eventually locks up after lots of these messages)
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists