linux-kernel - Re: use-after-free in __perf_install_in

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CACT4Y+aEjLyS2syQnX_txb7pmYxC+ecMxv_J3Q8LfbVZaRPW-Q@mail.gmail.com>
Date:	Fri, 8 Jan 2016 11:28:05 +0100
From:	Dmitry Vyukov <dvyukov@...gle.com>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Ingo Molnar <mingo@...hat.com>,
	Arnaldo Carvalho de Melo <acme@...nel.org>,
	LKML <linux-kernel@...r.kernel.org>,
	syzkaller <syzkaller@...glegroups.com>,
	Kostya Serebryany <kcc@...gle.com>,
	Alexander Potapenko <glider@...gle.com>,
	Eric Dumazet <edumazet@...gle.com>,
	Sasha Levin <sasha.levin@...cle.com>
Subject: Re: use-after-free in __perf_install_in_context

On Fri, Jan 8, 2016 at 9:40 AM, Dmitry Vyukov <dvyukov@...gle.com> wrote:
> On Tue, Jan 5, 2016 at 1:17 PM, Peter Zijlstra <peterz@...radead.org> wrote:
>> On Thu, Dec 31, 2015 at 06:15:41PM +0100, Dmitry Vyukov wrote:
>>> On Thu, Dec 17, 2015 at 3:43 PM, Peter Zijlstra <peterz@...radead.org> wrote:
>>> > On Thu, Dec 17, 2015 at 03:35:32PM +0100, Dmitry Vyukov wrote:
>>> >> In short, I did not see use-after-frees but perf_event_open fuzzing
>>> >> started hanging VMs very frequently, so testing is inconclusive.
>>> >
>>> > Right, I'll get back to staring at that one.
>>> >
>>> > Running your test case on bare metal got my machine reproducably stuck
>>> > but in a different place than you've reported.
>>> >
>>> > Takes a while too, so far no luck in explaining/debugging it. All my
>>> > added debug code remains silent :/
>>> >
>>> > I'll go think up more ..
>>>
>>> Hi Peter,
>>>
>>> Any progress with this?
>>> I can't continue to test perf subsytem, as it reliably hangs my machines.
>>
>> Holidays got in the way, I'll continue asap when I've cleared some of
>> the easy mail backlog from the past two weeks.
>
>
> Hi Peter,
>
> I've hit the following WARNING twice over night run. This is on
> b06f3a168cdcd80026276898fd1fee443ef25743 (Jan 6) + the following patch
> from you. I can't easily reproduce it. Does it say something to you?
> It looks like it all happen in the context of a single perf_event_open
> call.
>
>
> ------------[ cut here ]------------
> WARNING: CPU: 2 PID: 2523 at kernel/events/core.c:1891
> event_sched_in.isra.95+0x84b/0xcd0()
> Modules linked in:
> CPU: 2 PID: 2523 Comm: syz-executor Not tainted 4.4.0-rc8+ #206
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>  00000000ffffffff ffff880042d7f7d8 ffffffff8289a56d 0000000000000000
>  ffff8800590b9780 ffffffff85db2300 ffff880042d7f818 ffffffff812eb849
>  ffffffff815ae34b ffffffff85db2300 0000000000000763 ffff88005f35f4e0
> Call Trace:
>  [<     inline     >] __dump_stack lib/dump_stack.c:15
>  [<ffffffff8289a56d>] dump_stack+0x6f/0xa2 lib/dump_stack.c:50
>  [<ffffffff812eb849>] warn_slowpath_common+0xd9/0x140 kernel/panic.c:460
>  [<ffffffff812eba79>] warn_slowpath_null+0x29/0x30 kernel/panic.c:493
>  [<ffffffff815ae34b>] event_sched_in.isra.95+0x84b/0xcd0
> kernel/events/core.c:1891
>  [<ffffffff815b4164>] group_sched_in+0x114/0x410 kernel/events/core.c:1963
>  [<     inline     >] ctx_pinned_sched_in kernel/events/core.c:2744
>  [<ffffffff815b5085>] ctx_sched_in+0xc25/0x11b0 kernel/events/core.c:2807
>  [<ffffffff815b5657>] perf_event_sched_in+0x47/0xa0 kernel/events/core.c:2073
>  [<ffffffff815b587c>] perf_event_context_sched_in+0x1cc/0x380
> kernel/events/core.c:2844
>  [<ffffffff815bf4f3>] __perf_event_task_sched_in+0x3a3/0x480
> kernel/events/core.c:2872
>  [<     inline     >] perf_event_task_sched_in include/linux/perf_event.h:931
>  [<ffffffff8136d40f>] finish_task_switch+0x3cf/0x610 kernel/sched/core.c:2576
>  [<ffffffff85c7b08d>] __schedule+0x902016/01/07 20:52:38 executing program 2:
>  [<ffffffff85c7cbc2>] preempt_schedule_common+0x42/0x70 kernel/sched/core.c:3250
>  [<ffffffff85c7cc07>] _cond_resched+0x17/0x20 kernel/sched/core.c:4618
>  [<     inline     >] slab_pre_alloc_hook mm/slub.c:1317
>  [<     inline     >] slab_alloc_node mm/slub.c:2520
>  [<     inline     >] slab_alloc mm/slub.c:2602
>  [<ffffffff816cba4b>] kmem_cache_alloc_trace+0x24b/0x2f0 mm/slub.c:2619
>  [<     inline     >] kmalloc include/linux/slab.h:458
>  [<     inline     >] kzalloc include/linux/slab.h:602
>  [<     inline     >] aa_alloc_file_context
> security/apparmor/include/context.h:46
>  [<ffffffff826a9c85>] apparmor_file_alloc_security+0x45/0xa0
> security/apparmor/lsm.c:402
>  [<ffffffff825e2a1d>] security_file_alloc+0x6d/0xa0 security/security.c:742
>  [<ffffffff81716aba>] get_empty_filp+0x13a/0x3f0 fs/file_table.c:128
>  [<ffffffff81716d90>] alloc_file+0x20/0x380 fs/file_table.c:163
>  [<ffffffff817f4fdd>] anon_inode_getfile+0x19d/0x340 fs/anon_inodes.c:105
>  [<ffffffff815c7daa>] SYSC_perf_event_open+0x12da/0x1fa0
> kernel/events/core.c:8471
>  [<ffffffff815cf609>] SyS_perf_event_open+0x39/0x50 kernel/events/core.c:8262
>  [<ffffffff85c8a2b6>] entry_SYSCALL_64_fastpath+0x16/0x7a
> arch/x86/entry/entry_64.S:185
> ---[ end trace 326ea08cc5f1b454 ]---


I've added the following change in an attempt to catch this WARNING:

@@ -2835,6 +2841,10 @@ static void perf_event_context_sched_in(struct
perf_event_context *ctx,
        if (ctx->nr_events)
                cpuctx->task_ctx = ctx;

+       mdelay(10);
+       if (cpuctx->task_ctx != NULL && cpuctx->task_ctx != ctx)
+               pr_err("HERE!!!");
+
        perf_event_sched_in(cpuctx, cpuctx->task_ctx, task);

        perf_pmu_enable(ctx->pmu);

Does this change legitimate?
Now my program that stresses perf hangs in an interesting state.
Processes consume some CPU:

  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND

                                   6671 nobody    20   0     0    0
0 Z  20.2  0.0  13:41.43 syz-executor

                     6542 nobody    20   0     0    0    0 Z  17.6
0.0  13:25.74 syz-executor

          6558 nobody    20   0     0    0    0 Z  17.6  0.0  13:32.36
syz-executor

Some are blocked on futexes:

# cat /proc/6671/task/**/stack
[<ffffffff812f5bc4>] do_exit+0x1ab4/0x2b20
[<ffffffff812f6da8>] do_group_exit+0x108/0x320
[<ffffffff812f6fdd>] SyS_exit_group+0x1d/0x20
[<ffffffff85c8a2f6>] entry_SYSCALL_64_fastpath+0x16/0x7a
[<ffffffffffffffff>] 0xffffffffffffffff
[<ffffffff8148464d>] futex_wait_queue_me+0x30d/0x5e0
[<ffffffff81485125>] futex_wait+0x215/0x5f0
[<ffffffff81489738>] do_futex+0x218/0x1680
[<ffffffff8148acd3>] SyS_futex+0x133/0x2b0
[<ffffffff85c8a2f6>] entry_SYSCALL_64_fastpath+0x16/0x7a
[<ffffffffffffffff>] 0xffffffffffffffff
[<ffffffff8148464d>] futex_wait_queue_me+0x30d/0x5e0
[<ffffffff81485125>] futex_wait+0x215/0x5f0
[<ffffffff81489738>] do_futex+0x218/0x1680
[<ffffffff8148acd3>] SyS_futex+0x133/0x2b0
[<ffffffff85c8a2f6>] entry_SYSCALL_64_fastpath+0x16/0x7a
[<ffffffffffffffff>] 0xffffffffffffffff

And some hang in exit_to_usermode_loop:

# cat /proc/6542/task/**/stack
[<ffffffff812f5bc4>] do_exit+0x1ab4/0x2b20
[<ffffffff812f6da8>] do_group_exit+0x108/0x320
[<ffffffff812f6fdd>] SyS_exit_group+0x1d/0x20
[<ffffffff85c8a2f6>] entry_SYSCALL_64_fastpath+0x16/0x7a
[<ffffffffffffffff>] 0xffffffffffffffff
[<ffffffff810038e5>] exit_to_usermode_loop+0xd5/0x1a0
[<ffffffff8100631f>] syscall_return_slowpath+0x19f/0x210
[<ffffffff85c8a462>] int_ret_from_sys_call+0x25/0x9f
[<ffffffffffffffff>] 0xffffffffffffffff
[<ffffffff8148464d>] futex_wait_queue_me+0x30d/0x5e0
[<ffffffff81485125>] futex_wait+0x215/0x5f0
[<ffffffff81489738>] do_futex+0x218/0x1680
[<ffffffff8148acd3>] SyS_futex+0x133/0x2b0
[<ffffffff85c8a2f6>] entry_SYSCALL_64_fastpath+0x16/0x7a
[<ffffffffffffffff>] 0xffffffffffffffff

They are unkillable and one of the threaded did do_group_exit...