netdev - Re: KASAN: use-after-free Read in sock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 29 Nov 2017 12:49:17 -0800
From:   Eric Dumazet <eric.dumazet@...il.com>
To:     Cong Wang <xiyou.wangcong@...il.com>,
        syzbot 
        <bot+9abea25706ae35022385a41f61e579ed66e88a3f@...kaller.appspotmail.com>
Cc:     David Miller <davem@...emloft.net>,
        LKML <linux-kernel@...r.kernel.org>,
        Linux Kernel Network Developers <netdev@...r.kernel.org>,
        syzkaller-bugs@...glegroups.com,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Al Viro <viro@...iv.linux.org.uk>
Subject: Re: KASAN: use-after-free Read in sock_release

On Wed, 2017-11-29 at 11:37 -0800, Cong Wang wrote:
> (Cc'ing fs people...)
> 
> On Wed, Nov 29, 2017 at 12:33 AM, syzbot
> <bot+9abea25706ae35022385a41f61e579ed66e88a3f@...kaller.appspotmail.c
> om>
> wrote:
> > Hello,
> > 
> > syzkaller hit the following crash on
> > 1d3b78bbc6e983fabb3fbf91b76339bf66e4a12c
> > git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-
> > next.git/master
> > compiler: gcc (GCC) 7.1.1 20170620
> > .config is attached
> > Raw console output is attached.
> > 
> > Unfortunately, I don't have any reproducer for this bug yet.
> > 
> > 
> > device syz3 left promiscuous mode
> > device syz3 entered promiscuous mode
> > ==================================================================
> > BUG: KASAN: use-after-free in sock_release+0x1c6/0x1e0
> > net/socket.c:601
> > Read of size 8 at addr ffff8801c8dd1d10 by task syz-executor4/31085
> > 
> > CPU: 0 PID: 31085 Comm: syz-executor4 Not tainted 4.14.0+ #129
> > Hardware name: Google Google Compute Engine/Google Compute Engine,
> > BIOS
> > Google 01/01/2011
> > Call Trace:
> >  __dump_stack lib/dump_stack.c:17 [inline]
> >  dump_stack+0x194/0x257 lib/dump_stack.c:53
> >  print_address_description+0x73/0x250 mm/kasan/report.c:252
> >  kasan_report_error mm/kasan/report.c:351 [inline]
> >  kasan_report+0x25b/0x340 mm/kasan/report.c:409
> >  __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:430
> >  sock_release+0x1c6/0x1e0 net/socket.c:601
> >  sock_close+0x16/0x20 net/socket.c:1125
> >  __fput+0x333/0x7f0 fs/file_table.c:210
> >  ____fput+0x15/0x20 fs/file_table.c:244
> >  task_work_run+0x199/0x270 kernel/task_work.c:113
> >  exit_task_work include/linux/task_work.h:22 [inline]
> >  do_exit+0x9bb/0x1ae0 kernel/exit.c:865
> >  do_group_exit+0x149/0x400 kernel/exit.c:968
> >  get_signal+0x73f/0x16c0 kernel/signal.c:2335
> >  do_signal+0x94/0x1ee0 arch/x86/kernel/signal.c:809
> >  exit_to_usermode_loop+0x214/0x310 arch/x86/entry/common.c:158
> >  prepare_exit_to_usermode arch/x86/entry/common.c:195 [inline]
> >  syscall_return_slowpath+0x490/0x550 arch/x86/entry/common.c:264
> >  entry_SYSCALL_64_fastpath+0x94/0x96
> > RIP: 0033:0x452879
> > RSP: 002b:00007fb1c2435ce8 EFLAGS: 00000246 ORIG_RAX:
> > 00000000000000ca
> > RAX: fffffffffffffe00 RBX: 0000000000758100 RCX: 0000000000452879
> > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000758100
> > RBP: 0000000000758100 R08: 0000000000000304 R09: 00000000007580d8
> > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> > R13: 0000000000a6f7ff R14: 00007fb1c24369c0 R15: 000000000000000e
> > 
> > Allocated by task 31066:
> >  save_stack+0x43/0xd0 mm/kasan/kasan.c:447
> >  set_track mm/kasan/kasan.c:459 [inline]
> >  kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
> >  kmem_cache_alloc_trace+0x136/0x750 mm/slab.c:3613
> >  kmalloc include/linux/slab.h:499 [inline]
> >  sock_alloc_inode+0xb4/0x300 net/socket.c:253
> >  alloc_inode+0x65/0x180 fs/inode.c:208
> >  new_inode_pseudo+0x69/0x190 fs/inode.c:890
> >  sock_alloc+0x41/0x270 net/socket.c:565
> >  __sock_create+0x148/0x850 net/socket.c:1225
> >  sock_create net/socket.c:1301 [inline]
> >  SYSC_socket net/socket.c:1331 [inline]
> >  SyS_socket+0xeb/0x200 net/socket.c:1311
> >  entry_SYSCALL_64_fastpath+0x1f/0x96
> > 
> > Freed by task 3039:
> >  save_stack+0x43/0xd0 mm/kasan/kasan.c:447
> >  set_track mm/kasan/kasan.c:459 [inline]
> >  kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524
> >  __cache_free mm/slab.c:3491 [inline]
> >  kfree+0xca/0x250 mm/slab.c:3806
> >  __rcu_reclaim kernel/rcu/rcu.h:190 [inline]
> >  rcu_do_batch kernel/rcu/tree.c:2758 [inline]
> >  invoke_rcu_callbacks kernel/rcu/tree.c:3012 [inline]
> >  __rcu_process_callbacks kernel/rcu/tree.c:2979 [inline]
> >  rcu_process_callbacks+0xe79/0x17d0 kernel/rcu/tree.c:2996
> >  __do_softirq+0x29d/0xbb2 kernel/softirq.c:285
> 
> This looks more like a fs issue than network, my fs knowledge
> is not good enough to justify why the hell the inode could be
> destroyed before we release the fd.
> 
> My _guess_ is that it is because we defer the ____fput() to a
> task work. If this is the case, then fs layer is not guilty for this.
> 
> On the other hand, if we have to blame net layer, it does look
> suspicious on the RCU usage in sock_release() where we
> claim RCU protection but I don't see we hold any RCU lock
> there.

There is rcu protection for sock->wq, and the 1 in 
rcu_dereference_protected(sock->wq, 1) is because we do not have a
lockdep convenient way to express that we are the last user of sock,
and about to free it.


>  Also, the code that deferences sock->wq is pretty much
> useless now, at least I don't see it catches any bug though.
> 
> 
> diff --git a/net/socket.c b/net/socket.c
> index 42d8e9c9ccd5..b2390b5591a9 100644
> --- a/net/socket.c
> +++ b/net/socket.c
> @@ -598,9 +598,6 @@ void sock_release(struct socket *sock)
>                 module_put(owner);
>         }
> 
> -       if (rcu_dereference_protected(sock->wq, 1)->fasync_list)
> -               pr_err("%s: fasync list not empty!\n", __func__);
> -
> 

At this point, sock->wq must be valid, and freed later (by us)

This really looks like some other bug, and a late effect.