[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YFzgO0AhGFODmgc1@elver.google.com>
Date: Thu, 25 Mar 2021 20:10:51 +0100
From: Marco Elver <elver@...gle.com>
To: peterz@...radead.org
Cc: alexander.shishkin@...ux.intel.com, acme@...nel.org,
mingo@...hat.com, jolsa@...hat.com, mark.rutland@....com,
namhyung@...nel.org, tglx@...utronix.de, glider@...gle.com,
viro@...iv.linux.org.uk, arnd@...db.de, christian@...uner.io,
dvyukov@...gle.com, jannh@...gle.com, axboe@...nel.dk,
mascasa@...gle.com, pcc@...gle.com, irogers@...gle.com,
kasan-dev@...glegroups.com, linux-arch@...r.kernel.org,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
x86@...nel.org, linux-kselftest@...r.kernel.org
Subject: Re: [PATCH v3 01/11] perf: Rework perf_event_exit_event()
On Thu, Mar 25, 2021 at 05:17PM +0100, Marco Elver wrote:
[...]
> > syzkaller found a crash with stack trace pointing at changes in this
> > patch. Can't tell if this is an old issue or introduced in this series.
>
> Yay, I found a reproducer. v5.12-rc4 is good, and sadly with this patch only we
> crash. :-/
>
> Here's a stacktrace with just this patch applied:
>
> | BUG: kernel NULL pointer dereference, address: 00000000000007af
[...]
> | RIP: 0010:task_pid_ptr kernel/pid.c:324 [inline]
> | RIP: 0010:__task_pid_nr_ns+0x112/0x240 kernel/pid.c:500
[...]
> | Call Trace:
> | perf_event_pid_type kernel/events/core.c:1412 [inline]
> | perf_event_pid kernel/events/core.c:1421 [inline]
> | perf_event_read_event+0x78/0x1d0 kernel/events/core.c:7406
> | sync_child_event kernel/events/core.c:12404 [inline]
> | perf_child_detach kernel/events/core.c:2223 [inline]
> | __perf_remove_from_context+0x14d/0x280 kernel/events/core.c:2359
> | perf_remove_from_context+0x9f/0xf0 kernel/events/core.c:2395
> | perf_event_exit_event kernel/events/core.c:12442 [inline]
> | perf_event_exit_task_context kernel/events/core.c:12523 [inline]
> | perf_event_exit_task+0x276/0x4c0 kernel/events/core.c:12556
> | do_exit+0x4cd/0xed0 kernel/exit.c:834
> | do_group_exit+0x4d/0xf0 kernel/exit.c:922
> | get_signal+0x1d2/0xf30 kernel/signal.c:2777
> | arch_do_signal_or_restart+0xf7/0x750 arch/x86/kernel/signal.c:789
> | handle_signal_work kernel/entry/common.c:147 [inline]
> | exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
> | exit_to_user_mode_prepare+0x113/0x190 kernel/entry/common.c:208
> | irqentry_exit_to_user_mode+0x6/0x30 kernel/entry/common.c:314
> | asm_exc_general_protection+0x1e/0x30 arch/x86/include/asm/idtentry.h:571
I spun up gdb, and it showed me this:
| #0 perf_event_read_event (event=event@...ry=0xffff888107cd5000, task=task@...ry=0xffffffffffffffff)
| at kernel/events/core.c:7397
^^^ TASK_TOMBSTONE
| #1 0xffffffff811fc9cd in sync_child_event (child_event=0xffff888107cd5000) at kernel/events/core.c:12404
| #2 perf_child_detach (event=0xffff888107cd5000) at kernel/events/core.c:2223
| #3 __perf_remove_from_context (event=event@...ry=0xffff888107cd5000, cpuctx=cpuctx@...ry=0xffff88842fdf0c00,
| ctx=ctx@...ry=0xffff8881073cb800, info=info@...ry=0x3 <fixed_percpu_data+3>) at kernel/events/core.c:2359
| #4 0xffffffff811fcb9f in perf_remove_from_context (event=event@...ry=0xffff888107cd5000, flags=flags@...ry=3)
| at kernel/events/core.c:2395
| #5 0xffffffff81204526 in perf_event_exit_event (ctx=0xffff8881073cb800, event=0xffff888107cd5000)
| at kernel/events/core.c:12442
| #6 perf_event_exit_task_context (ctxn=0, child=0xffff88810531a200) at kernel/events/core.c:12523
| #7 perf_event_exit_task (child=0xffff88810531a200) at kernel/events/core.c:12556
| #8 0xffffffff8108838d in do_exit (code=code@...ry=11) at kernel/exit.c:834
| #9 0xffffffff81088e4d in do_group_exit (exit_code=11) at kernel/exit.c:922
and therefore synthesized this fix on top:
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 57de8d436efd..e77294c7e654 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -12400,7 +12400,7 @@ static void sync_child_event(struct perf_event *child_event)
if (child_event->attr.inherit_stat) {
struct task_struct *task = child_event->ctx->task;
- if (task)
+ if (task && task != TASK_TOMBSTONE)
perf_event_read_event(child_event, task);
}
which fixes the problem. My guess is that the parent and child are both
racing to exit?
Does that make any sense?
Thanks,
-- Marco
Powered by blists - more mailing lists