linux-kernel - Re: perf: fuzzer triggered warning in intel_pmu_drain_pebs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAMsRxfJeV6JYJ-jke863EgJA0sFo0sZUTX8a4X3RhKaCvc_UEw@mail.gmail.com>
Date:	Wed, 15 Jul 2015 08:42:50 +0200
From:	Stephane Eranian <eranian@...glemail.com>
To:	Vince Weaver <vincent.weaver@...ne.edu>
Cc:	Peter Zijlstra <peterz@...radead.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Ingo Molnar <mingo@...hat.com>,
	Arnaldo Carvalho de Melo <acme@...nel.org>,
	kan.liang@...el.com
Subject: Re: perf: fuzzer triggered warning in intel_pmu_drain_pebs_nhm()

On Fri, Jul 3, 2015 at 9:49 PM, Vince Weaver <vincent.weaver@...ne.edu> wrote:
> On Fri, 3 Jul 2015, Peter Zijlstra wrote:
>
>> That said, its far too warm and I might just not be making sense.
>
> you need to come visit Maine!  Although I am not sure the cooler weather
> necessarily improves my kernel debugging skills.
>
> I managed to lock the machine (again this is with the patch applied).
>
I can reproduce the problem on my HSW running the fuzzer.

I can see why this could be happening if you are mixing PEBS and non PEBS events
in the bottom 4 counters. I suspect:
        for (bit = 0; bit < x86_pmu.max_pebs_events; bit++) {
                if ((counts[bit] == 0) && (error[bit] == 0))
                        continue;

This test is not correct when you have non-PEBS events mixed with PEBS
events and
they overflow at the same time. They will have counts[i] != 0 but
error[i] == 0, and thus
you fall thru the loop and hit the assert. Or it is something along those lines.


> [  299.366027] ------------[ cut here ]------------
> [  299.370985] WARNING: CPU: 2 PID: 8241 at arch/x86/kernel/cpu/perf_event_intel_ds.c:1198 intel_pmu_drain_pebs_nhm+0x283/0x2e0()
> [  299.456929] CPU: 2 PID: 8241 Comm: perf_fuzzer Tainted: G        W       4.1.0+ #164
> [  299.465750] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
> [  299.474274]  ffffffff81a105a0 ffff88011ea85b10 ffffffff8169f823 0000000000000000
> [  299.482864]  0000000000000000 ffff88011ea85b50 ffffffff8106ec8a ffff88011ea85ba0
> [  299.491488]  0000000000000000 0000000000000001 ffff88011ea8bd80 ffff8801190400c0
> [  299.500029] Call Trace:
> [  299.503190]  <NMI>  [<ffffffff8169f823>] dump_stack+0x45/0x57
> [  299.509936]  [<ffffffff8106ec8a>] warn_slowpath_common+0x8a/0xc0
> [  299.516901]  [<ffffffff8106ed7a>] warn_slowpath_null+0x1a/0x20
> [  299.523715]  [<ffffffff8102f783>] intel_pmu_drain_pebs_nhm+0x283/0x2e0
> [  299.531268]  [<ffffffff81032235>] intel_pmu_handle_irq+0x255/0x440
> [  299.538487]  [<ffffffff81028e76>] perf_event_nmi_handler+0x26/0x40
> [  299.545638]  [<ffffffff810181ad>] nmi_handle+0x9d/0x140
> [  299.551772]  [<ffffffff81018115>] ? nmi_handle+0x5/0x140
> [  299.558013]  [<ffffffff8101843a>] default_do_nmi+0x4a/0x120
> [  299.564527]  [<ffffffff8101859d>] do_nmi+0x8d/0xc0
> [  299.570185]  [<ffffffff816a979f>] end_repeat_nmi+0x1e/0x2e
> [  299.576580]  [<ffffffff811bc9d2>] ? check_poison_obj+0x92/0x230
> [  299.583390]  [<ffffffff811bc9d2>] ? check_poison_obj+0x92/0x230
> [  299.590163]  [<ffffffff811bc9d2>] ? check_poison_obj+0x92/0x230
> [  299.596922]  <<EOE>>  [<ffffffff8115bea8>] ? perf_event_alloc+0x58/0x680
> [  299.604594]  [<ffffffff811bcf7d>] cache_alloc_debugcheck_after.isra.51+0x1cd/0x250
> [  299.613140]  [<ffffffff811c08b6>] kmem_cache_alloc_trace+0xa6/0x510
> [  299.620330]  [<ffffffff8115bea8>] ? perf_event_alloc+0x58/0x680
> [  299.627088]  [<ffffffff8106ee48>] ? get_online_cpus+0x58/0x70
> [  299.633688]  [<ffffffff8115bea8>] perf_event_alloc+0x58/0x680
> [  299.640319]  [<ffffffff8115c897>] SYSC_perf_event_open+0x3c7/0xd40
> [  299.647353]  [<ffffffff8105f86b>] ? __do_page_fault+0x1ab/0x3f0
> [  299.654172]  [<ffffffff8115d689>] SyS_perf_event_open+0x9/0x10
> [  299.660871]  [<ffffffff816a7572>] entry_SYSCALL_64_fastpath+0x16/0x7a
> [  299.668236] ---[ end trace 3356c74581c13f1d ]---
> [  299.673648] Uhhuh. NMI received for unknown reason 31 on CPU 2.
> [  299.680427] Do you have a strange power saving mode enabled?
> [  299.686963] Dazed and confused, but trying to continue
> [  299.692904] Uhhuh. NMI received for unknown reason 31 on CPU 2.
> [  299.699748] Do you have a strange power saving mode enabled?
> [  299.706227] Dazed and confused, but trying to continue
> [  299.712172] Uhhuh. NMI received for unknown reason 31 on CPU 2.
> [  299.718946] Do you have a strange power saving mode enabled?
> [  299.725446] Dazed and confused, but trying to continue
> [  299.731419] Uhhuh. NMI received for unknown reason 31 on CPU 2.
> [  299.738235] Do you have a strange power saving mode enabled?
> [  299.744740] Dazed and confused, but trying to continue
> [  299.750660] Uhhuh. NMI received for unknown reason 21 on CPU 2.
> [  299.757398] Do you have a strange power saving mode enabled?
> [  299.763862] Dazed and confused, but trying to continue
>
> (machine eventually locks up after lots of these messages)
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/