linux-kernel - Re: perf: fuzzer triggered warning in intel_pmu_drain_pebs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 16 Jul 2015 23:12:10 +0200
From:	Stephane Eranian <eranian@...glemail.com>
To:	Stephane Eranian <eranian@...il.com>
Cc:	Vince Weaver <vincent.weaver@...ne.edu>,
	LKML <linux-kernel@...r.kernel.org>,
	Ingo Molnar <mingo@...hat.com>,
	Arnaldo Carvalho de Melo <acme@...nel.org>,
	kan.liang@...el.com
Subject: Re: perf: fuzzer triggered warning in intel_pmu_drain_pebs_nhm()

hi,


The assertion I am seeing on HSW now is:
[  114.652263] WARNING: CPU: 6 PID: 3538 at
arch/x86/kernel/cpu/perf_event.c:1209 x86_pmu_start+0xaa/0x100()
[  114.652264] Modules linked in: snd_hda_codec_hdmi i915 bnep rfcomm
bluetooth drm_kms_helper snd_hda_codec_realtek snd_hda_codec_generic
drm snd_hda_intel snd_hda_codec snd_hda_core intel_rapl iosf_mbi
snd_hwdep x86_pkg_temp_thermal snd_pcm intel_powerclamp snd_seq_midi
snd_seq_midi_event coretemp snd_rawmidi snd_seq kvm_intel kvm
snd_seq_device snd_timer snd crct10dif_pclmul crc32_pclmul soundcore
mei_me mei ghash_clmulni_intel aesni_intel mxm_wmi aes_x86_64 lrw
i2c_algo_bit gf128mul lpc_ich glue_helper shpchp serio_raw ablk_helper
cryptd tpm_infineon soc_button_array wmi video mac_hid
intel_smartconnect nls_iso8859_1 parport_pc ppdev lp parport uas
usb_storage psmouse r8169 ahci libahci mii
[  114.652287] CPU: 6 PID: 3538 Comm: perf_fuzzer Not tainted 4.2.0-rc2+ #3
[  114.652287] Hardware name: MSI MS-7816/Z87-G43 (MS-7816), BIOS V1.0
04/02/2013
[  114.652288]  ffffffff81a8e078 ffff880232e33da8 ffffffff8178881a
0000000000000007
[  114.652289]  0000000000000000 ffff880232e33de8 ffffffff81073a4a
ffff88021f844900
[  114.652291]  ffff88023f38bbc0 ffff8800a85d8800 0000000000000000
ffff88023f31a8f8
[  114.652292] Call Trace:
[  114.652295]  [<ffffffff8178881a>] dump_stack+0x45/0x57
[  114.652299]  [<ffffffff81073a4a>] warn_slowpath_common+0x8a/0xc0
[  114.652300]  [<ffffffff81073b3a>] warn_slowpath_null+0x1a/0x20
[  114.652302]  [<ffffffff8102aeea>] x86_pmu_start+0xaa/0x100
[  114.652304]  [<ffffffff81162e09>] perf_ioctl+0x3b9/0x400
[  114.652306]  [<ffffffff811f851a>] do_vfs_ioctl+0x2ba/0x490
[  114.652307]  [<ffffffff811f700f>] ? f_modown+0x4f/0xa0
[  114.652308]  [<ffffffff811f70e5>] ? f_setown+0x45/0x50
[  114.652309]  [<ffffffff811f8769>] SyS_ioctl+0x79/0x90
[  114.652310]  [<ffffffff8178f52e>] tracesys_phase2+0x88/0x8d

Which corresponds to:
static void x86_pmu_start(struct perf_event *event, int flags)
{
        struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
        int idx = event->hw.idx;

        if (WARN_ON_ONCE(!(event->hw.state & PERF_HES_STOPPED)))
                return;
        ^^^^^ this one

The kernel is trying to start an event which was not stopped.

As for NHM, the irqstuck loop is there. After more instrumentation, it
seems to hit
only when sampling on fixed counter 3.


On Thu, Jul 16, 2015 at 9:30 AM, Stephane Eranian
<eranian@...glemail.com> wrote:
> On Thu, Jul 16, 2015 at 12:15 AM, Peter Zijlstra <peterz@...radead.org> wrote:
>> On Thu, Jul 16, 2015 at 08:02:03AM +0200, Stephane Eranian wrote:
>>> Been running it for a couple of hours, so far so good. I will let it
>>> run all night.
>>
>> Thanks!
>>
> Well, it died on NHM in the same function despite your patch. Need to
> look at the exact warning.\
> So more work is needed. But then I also saw the irq loop stuck message
> before that.
>
>
>>> > ---
>>> >  arch/x86/kernel/cpu/perf_event_intel_ds.c | 29 +++++++++++++----------------
>>> >  1 file changed, 13 insertions(+), 16 deletions(-)
>>> >
>>> > diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c
>>> > index 71fc40238843..68d0ced1d229 100644
>>> > --- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
>>> > +++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
>>> > @@ -1142,6 +1142,7 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs)
>>> >
>>> >         for (at = base; at < top; at += x86_pmu.pebs_record_size) {
>>> >                 struct pebs_record_nhm *p = at;
>>> > +               u64 pebs_status;
>>> >
>>> >                 /* PEBS v3 has accurate status bits */
>>> >                 if (x86_pmu.intel_cap.pebs_format >= 3) {
>>> > @@ -1152,12 +1153,14 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs)
>>> >                         continue;
>>> >                 }
>>> >
>>> > -               bit = find_first_bit((unsigned long *)&p->status,
>>> > +               pebs_status = p->status & cpuc->pebs_enabled;
>>> > +               pebs_status &= (1ULL << x86_pmu.max_pebs_events) - 1;
>>> > +
>>> > +               bit = find_first_bit((unsigned long *)&pebs_status,
>>> >                                         x86_pmu.max_pebs_events);
>>> >                 if (bit >= x86_pmu.max_pebs_events)
>>> >                         continue;
>>
>> Maybe we should WARN in this case? A PEBS entry without any PEBS bits
>> set in the status field would be 'weird', right?
>>
>> Maybe something like:
>>
>>                 if (WARN(bit >= x86_pmu.max_pebs_events,
>>                          "PEBS record without PEBS event! status=%Lx pebs_enabled=%Lx active_mask=%Lx",
>>                          p->status, cpuc->pebs_enabled, cpuc->active_mask))
>>                         continue;
>>
>> If that triggers we at least get more info.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/