[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YFy3qI65dBfbsZ1z@elver.google.com>
Date: Thu, 25 Mar 2021 17:17:44 +0100
From: Marco Elver <elver@...gle.com>
To: peterz@...radead.org
Cc: alexander.shishkin@...ux.intel.com, acme@...nel.org,
mingo@...hat.com, jolsa@...hat.com, mark.rutland@....com,
namhyung@...nel.org, tglx@...utronix.de, glider@...gle.com,
viro@...iv.linux.org.uk, arnd@...db.de, christian@...uner.io,
dvyukov@...gle.com, jannh@...gle.com, axboe@...nel.dk,
mascasa@...gle.com, pcc@...gle.com, irogers@...gle.com,
kasan-dev@...glegroups.com, linux-arch@...r.kernel.org,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
x86@...nel.org, linux-kselftest@...r.kernel.org
Subject: Re: [PATCH v3 01/11] perf: Rework perf_event_exit_event()
On Thu, Mar 25, 2021 at 11:17AM +0100, Marco Elver wrote:
> On Wed, Mar 24, 2021 at 12:24PM +0100, Marco Elver wrote:
> > From: Peter Zijlstra <peterz@...radead.org>
> >
> > Make perf_event_exit_event() more robust, such that we can use it from
> > other contexts. Specifically the up and coming remove_on_exec.
> >
> > For this to work we need to address a few issues. Remove_on_exec will
> > not destroy the entire context, so we cannot rely on TASK_TOMBSTONE to
> > disable event_function_call() and we thus have to use
> > perf_remove_from_context().
> >
> > When using perf_remove_from_context(), there's two races to consider.
> > The first is against close(), where we can have concurrent tear-down
> > of the event. The second is against child_list iteration, which should
> > not find a half baked event.
> >
> > To address this, teach perf_remove_from_context() to special case
> > !ctx->is_active and about DETACH_CHILD.
> >
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
> > Signed-off-by: Marco Elver <elver@...gle.com>
> > ---
> > v3:
> > * New dependency for series:
> > https://lkml.kernel.org/r/YFn/I3aKF+TOjGcl@hirez.programming.kicks-ass.net
> > ---
>
> syzkaller found a crash with stack trace pointing at changes in this
> patch. Can't tell if this is an old issue or introduced in this series.
Yay, I found a reproducer. v5.12-rc4 is good, and sadly with this patch only we
crash. :-/
Here's a stacktrace with just this patch applied:
| BUG: kernel NULL pointer dereference, address: 00000000000007af
| #PF: supervisor read access in kernel mode
| #PF: error_code(0x0000) - not-present page
| PGD 0 P4D 0
| Oops: 0000 [#1] PREEMPT SMP PTI
| CPU: 7 PID: 465 Comm: a.out Not tainted 5.12.0-rc4+ #25
| Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
| RIP: 0010:task_pid_ptr kernel/pid.c:324 [inline]
| RIP: 0010:__task_pid_nr_ns+0x112/0x240 kernel/pid.c:500
| Code: e8 13 55 07 00 e8 1e a6 0e 00 48 c7 c6 83 1e 0b 81 48 c7 c7 a0 2e d5 82 e8 4b 08 04 00 44 89 e0 5b 5d 41 5c c3 e8 fe a5 0e 00 <48> 8b 85 b0 07 00 00 4a 8d ac e0 98 01 00 00 e9 5a ff ff ff e8 e5
| RSP: 0000:ffffc90001b73a60 EFLAGS: 00010093
| RAX: 0000000000000000 RBX: ffffffff82c69820 RCX: ffffffff810b1eb2
| RDX: ffff888108d143c0 RSI: 0000000000000000 RDI: ffffffff8299ccc6
| RBP: ffffffffffffffff R08: 0000000000000001 R09: 0000000000000000
| R10: ffff888108d14db8 R11: 0000000000000000 R12: 0000000000000001
| R13: ffffffffffffffff R14: ffffffffffffffff R15: ffff888108e05240
| FS: 0000000000000000(0000) GS:ffff88842fdc0000(0000) knlGS:0000000000000000
| CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
| CR2: 00000000000007af CR3: 0000000002c22002 CR4: 0000000000770ee0
| DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
| DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
| PKRU: 55555554
| Call Trace:
| perf_event_pid_type kernel/events/core.c:1412 [inline]
| perf_event_pid kernel/events/core.c:1421 [inline]
| perf_event_read_event+0x78/0x1d0 kernel/events/core.c:7406
| sync_child_event kernel/events/core.c:12404 [inline]
| perf_child_detach kernel/events/core.c:2223 [inline]
| __perf_remove_from_context+0x14d/0x280 kernel/events/core.c:2359
| perf_remove_from_context+0x9f/0xf0 kernel/events/core.c:2395
| perf_event_exit_event kernel/events/core.c:12442 [inline]
| perf_event_exit_task_context kernel/events/core.c:12523 [inline]
| perf_event_exit_task+0x276/0x4c0 kernel/events/core.c:12556
| do_exit+0x4cd/0xed0 kernel/exit.c:834
| do_group_exit+0x4d/0xf0 kernel/exit.c:922
| get_signal+0x1d2/0xf30 kernel/signal.c:2777
| arch_do_signal_or_restart+0xf7/0x750 arch/x86/kernel/signal.c:789
| handle_signal_work kernel/entry/common.c:147 [inline]
| exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
| exit_to_user_mode_prepare+0x113/0x190 kernel/entry/common.c:208
| irqentry_exit_to_user_mode+0x6/0x30 kernel/entry/common.c:314
| asm_exc_general_protection+0x1e/0x30 arch/x86/include/asm/idtentry.h:571
Attached is a C reproducer of the syzkaller program that crashes us.
Thanks,
-- Marco
View attachment "perf-nullptr-deref.c" of type "text/x-csrc" (6596 bytes)
Powered by blists - more mailing lists