linux-kernel - Re: perf: perf_fuzzer triggers GPF in perf_prepare

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20181209115523.GA3501@krava>
Date:   Sun, 9 Dec 2018 12:55:23 +0100
From:   Jiri Olsa <jolsa@...hat.com>
To:     Vince Weaver <vincent.weaver@...ne.edu>
Cc:     linux-kernel@...r.kernel.org,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Namhyung Kim <namhyung@...nel.org>,
        Andi Kleen <andi@...stfloor.org>
Subject: Re: perf: perf_fuzzer triggers GPF in perf_prepare_sample

On Sat, Dec 08, 2018 at 09:08:28PM -0500, Vince Weaver wrote:
> On Thu, 6 Dec 2018, Jiri Olsa wrote:
> 
> > On Thu, Dec 06, 2018 at 10:35:28AM -0500, Vince Weaver wrote:
> > > On Wed, 5 Dec 2018, Jiri Olsa wrote:
> > > Maybe it is a corruption issue.  I had applied my own debug patch that 
> > > would dump some info if data->callchain was NULL.
> > > 
> > > But my debug code didn't trigger this time because it looks like 
> > > data->callchain was "1" rather than "0".
> > > 
> > > [27764.840179] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
> > > [27764.840179] PGD 0 P4D 0 
> > > [27764.840180] Oops: 0000 [#1] SMP PTI
> > > [27764.840180] CPU: 1 PID: 18687 Comm: perf_fuzzer Tainted: G        W         4.20.0-rc5+ #125
> > > [27764.840180] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
> > 
> > actually, you could try that patch from my previous email?
> > 
> still crashes with your patch (see below)
> 
> I've also been able to replicate this crash on a skylake machine in 
> addition to the haswell machine.
> 
> Vince
> 
> [28269.147232] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
> [28269.155628] PGD 0 P4D 0 
> [28269.158360] Oops: 0000 [#1] SMP PTI
> [28269.162087] CPU: 0 PID: 1189 Comm: perf_fuzzer Tainted: G        W         4.20.0-rc5+ #128
> [28269.171011] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
> [28269.178935] RIP: 0010:perf_prepare_sample+0x82/0x4a0
> [28269.184239] Code: 06 4c 89 ea 4c 89 e6 e8 3c 54 ff ff 40 f6 c5 01 0f 85 28 01 00 00 40 f6 c5 20 74 1c 48 85 ed 0f 89 04 01 00 00 49 8b 44 24 70 <48> 8b 00 8d 04 c5 08 00 00 00 66 01 43 06 f7 c5 00 04 00 00 74 41
> [28269.204249] RSP: 0000:ffffc9000aca7a40 EFLAGS: 00010082
> [28269.209832] RAX: 0000000000000000 RBX: ffffc9000aca7a98 RCX: ffffc9000aca7ad8
> [28269.217484] RDX: 0000000000000000 RSI: ffffc9000aca7b80 RDI: ffffc9000aca7a9e
> [28269.225129] RBP: 80000000000bb068 R08: 0000000000000002 R09: 00000000000215c0
> [28269.232760] R10: ffff8880ce552000 R11: 0000000000000000 R12: ffffc9000aca7b80
> [28269.240380] R13: ffff88803696c800 R14: ffffc9000aca7ad8 R15: ffffe8ffffc06300
> [28269.248014] FS:  00007f5927fe7500(0000) GS:ffff88811aa00000(0000) knlGS:0000000000000000
> [28269.256606] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [28269.262739] CR2: 0000000000000000 CR3: 0000000116d98001 CR4: 00000000001607f0
> [28269.270349] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [28269.277968] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
> [28269.285639] Call Trace:
> [28269.288266]  intel_pmu_drain_bts_buffer+0x151/0x220
> [28269.293476]  ? radix_tree_delete_item+0x69/0xc0
> [28269.298378]  x86_pmu_stop+0x3b/0x90
> [28269.302113]  x86_pmu_del+0x57/0x160

nice, at least it's in different callstack context, that might help

thanks,
jirka