[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170307131649.GA3358@twins.programming.kicks-ass.net>
Date: Tue, 7 Mar 2017 14:16:49 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: Dmitry Vyukov <dvyukov@...gle.com>
Cc: Ingo Molnar <mingo@...hat.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
LKML <linux-kernel@...r.kernel.org>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
syzkaller <syzkaller@...glegroups.com>,
Oleg Nesterov <oleg@...hat.com>
Subject: Re: perf: use-after-free in perf_release
On Mon, Mar 06, 2017 at 02:14:59PM +0100, Peter Zijlstra wrote:
> On Mon, Mar 06, 2017 at 10:57:07AM +0100, Dmitry Vyukov wrote:
>
> > ==================================================================
> > BUG: KASAN: use-after-free in atomic_dec_and_test
> > arch/x86/include/asm/atomic.h:123 [inline] at addr ffff880079c30158
> > BUG: KASAN: use-after-free in put_task_struct
> > include/linux/sched/task.h:93 [inline] at addr ffff880079c30158
> > BUG: KASAN: use-after-free in put_ctx+0xcf/0x110
>
> FWIW, this output is very confusing, is this a result of your
> post-processing replicating the line for every 'inlined' part?
>
> > kernel/events/core.c:1131 at addr ffff880079c30158
> > Write of size 4 by task syz-executor6/25698
>
> > atomic_dec_and_test arch/x86/include/asm/atomic.h:123 [inline]
> > put_task_struct include/linux/sched/task.h:93 [inline]
> > put_ctx+0xcf/0x110 kernel/events/core.c:1131
> > perf_event_release_kernel+0x3ad/0xc90 kernel/events/core.c:4322
> > perf_release+0x37/0x50 kernel/events/core.c:4338
> > __fput+0x332/0x800 fs/file_table.c:209
> > ____fput+0x15/0x20 fs/file_table.c:245
> > task_work_run+0x197/0x260 kernel/task_work.c:116
> > exit_task_work include/linux/task_work.h:21 [inline]
> > do_exit+0xb38/0x29c0 kernel/exit.c:880
> > do_group_exit+0x149/0x420 kernel/exit.c:984
> > get_signal+0x7e0/0x1820 kernel/signal.c:2318
> > do_signal+0xd2/0x2190 arch/x86/kernel/signal.c:808
> > exit_to_usermode_loop+0x200/0x2a0 arch/x86/entry/common.c:157
> > syscall_return_slowpath arch/x86/entry/common.c:191 [inline]
> > do_syscall_64+0x6fc/0x930 arch/x86/entry/common.c:286
> > entry_SYSCALL64_slow_path+0x25/0x25
>
> So this is fput()..
>
>
> > Freed:
> > PID = 25681
> > save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
> > save_stack+0x43/0xd0 mm/kasan/kasan.c:513
> > set_track mm/kasan/kasan.c:525 [inline]
> > kasan_slab_free+0x6f/0xb0 mm/kasan/kasan.c:589
> > __cache_free mm/slab.c:3514 [inline]
> > kmem_cache_free+0x71/0x240 mm/slab.c:3774
> > free_task_struct kernel/fork.c:158 [inline]
> > free_task+0x151/0x1d0 kernel/fork.c:370
> > copy_process.part.38+0x18e5/0x4aa0 kernel/fork.c:1931
> > copy_process kernel/fork.c:1531 [inline]
> > _do_fork+0x200/0x1010 kernel/fork.c:1994
> > SYSC_clone kernel/fork.c:2104 [inline]
> > SyS_clone+0x37/0x50 kernel/fork.c:2098
> > do_syscall_64+0x2e8/0x930 arch/x86/entry/common.c:281
> > return_from_SYSCALL_64+0x0/0x7a
>
> and this is a failed fork().
>
>
> However, inherited events don't have a filedesc to fput(), and
> similarly, a task that fails for has never been visible to attach a perf
> event to because it never hits the pid-hash.
>
> Or so it is assumed.
>
> I'm forever getting lost in the PID code. Oleg, is there any way
> find_task_by_vpid() can return a task that can still fail fork() ?
So I _think_ find_task_by_vpid() can return an already dead task; and
we'll happily increase task->usage.
Dmitry; I have no idea how easy it is for you to reproduce the thing;
but so far I've not had much success. Could you perhaps stick the below
in?
Once we convert task_struct to refcount_t that should generate a WARN of
its own I suppose.
---
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 000fdb2..612d652 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -763,6 +763,7 @@ struct perf_event_context {
#ifdef CONFIG_CGROUP_PERF
int nr_cgroups; /* cgroup evts */
#endif
+ int switches;
void *task_ctx_data; /* pmu specific data */
struct rcu_head rcu_head;
};
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 6f41548f..6455b7a 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2902,6 +2902,8 @@ static void perf_event_context_sched_out(struct task_struct *task, int ctxn,
if (!parent && !next_parent)
goto unlock;
+ ctx->switches++;
+
if (next_parent == ctx || next_ctx == parent || next_parent == parent) {
/*
* Looks like the two contexts are clones, so we might be
@@ -3780,6 +3782,12 @@ find_lively_task_by_vpid(pid_t vpid)
task = current;
else
task = find_task_by_vpid(vpid);
+
+ if (task) {
+ if (WARN_ON_ONCE(task->flags & PF_EXITING))
+ task = NULL;
+ }
+
if (task)
get_task_struct(task);
rcu_read_unlock();
@@ -10432,6 +10440,10 @@ void perf_event_free_task(struct task_struct *task)
mutex_unlock(&ctx->mutex);
+ WARN_ON_ONCE(ctx->switches);
+ WARN_ON_ONCE(atomic_read(&ctx->refcount) != 1);
+ WARN_ON_ONCE(ctx->task != task);
+
put_ctx(ctx);
}
}
Powered by blists - more mailing lists