linux-kernel - Re: dcache_readdir NULL inode oops

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20181120190317.GA29161@arm.com>
Date:   Tue, 20 Nov 2018 19:03:17 +0000
From:   Will Deacon <will.deacon@....com>
To:     Jan Glauber <Jan.Glauber@...ium.com>
Cc:     Alexander Viro <viro@...iv.linux.org.uk>,
        "linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: dcache_readdir NULL inode oops

On Tue, Nov 20, 2018 at 06:28:54PM +0000, Will Deacon wrote:
> On Sat, Nov 10, 2018 at 11:17:03AM +0000, Jan Glauber wrote:
> > On Fri, Nov 09, 2018 at 03:58:56PM +0000, Will Deacon wrote:
> > > On Fri, Nov 09, 2018 at 02:37:51PM +0000, Jan Glauber wrote:
> > > > I'm seeing the following oops reproducible with upstream kernel on arm64
> > > > (ThunderX2):
> > > 
> > > [...]
> > > 
> > > > It happens after 1-3 hours of running 'stress-ng --dev 128'. This testcase
> > > > does a scandir of /dev and then calls random stuff like ioctl, lseek,
> > > > open/close etc. on the entries. I assume no files are deleted under /dev
> > > > during the testcase.
> > > >
> > > > The NULL pointer is the inode pointer of next. The next dentry->d_flags is
> > > > DCACHE_RCUACCESS when this happens.
> > > >
> > > > Any hints on how to further debug this?
> > > 
> > > Can you reproduce the issue with vanilla -rc1 and do you have a "known good"
> > > kernel?
> > 
> > I can try out -rc1, but IIRC this wasn't bisectible as the bug was present at
> > least back to 4.14. I need to double check that as there were other issues
> > that are resolved now so I may confuse things here. I've defintely seen
> > the same bug with 4.18.
> > 
> > Unfortunately I lost access to the machine as our data center seems to be
> > moving currently so it might take some days until I can try -rc1.
> 
> Ok, I've just managed to reproduce this in a KVM guest running v4.20-rc3 on
> both the host and the guest, so if anybody has any ideas of things to try then
> I'm happy to give them a shot. In the meantime, I'll try again with a bunch of
> debug checks enabled.

Weee, I eventually hit a use-after-free from KASAN. See below.

Will

--->8

[  615.973367] ==================================================================
[  615.974675] BUG: KASAN: use-after-free in next_positive.isra.2+0x188/0x1a0
[  615.975574] Read of size 8 at addr ffff8002fb33c190 by task stress-ng-dev/3145
[  615.977348] 
[  615.977692] CPU: 16 PID: 3145 Comm: stress-ng-dev Tainted: G      D           4.20.0-rc3-00012-g40b114779944 #2
[  615.980171] Hardware name: linux,dummy-virt (DT)
[  615.981325] Call trace:
[  615.981765]  dump_backtrace+0x0/0x280
[  615.982386]  show_stack+0x14/0x20
[  615.983125]  dump_stack+0xc4/0xec
[  615.983141]  print_address_description+0x60/0x25c
[  615.985226]  kasan_report+0x1a8/0x358
[  615.986161]  __asan_report_load8_noabort+0x18/0x20
[  615.986978]  next_positive.isra.2+0x188/0x1a0
[  615.987767]  dcache_readdir+0x2cc/0x488
[  615.988428]  iterate_dir+0x168/0x448
[  615.989342]  ksys_getdents64+0xe8/0x248
[  615.990334]  __arm64_sys_getdents64+0x68/0x98
[  615.990341]  el0_svc_common+0x104/0x210
[  615.990345]  el0_svc_handler+0x48/0xb0
[  615.990349]  el0_svc+0x8/0xc
[  615.990356] 
[  615.994175] Allocated by task 2720:
[  615.994184]  kasan_kmalloc.part.1+0x40/0x108
[  615.994188]  kasan_kmalloc+0xb4/0xc8
[  615.994192]  kasan_slab_alloc+0x14/0x20
[  615.994195]  kmem_cache_alloc+0x130/0x1f8
[  615.994203]  __d_alloc+0x30/0x848
[  615.994215]  d_alloc+0x30/0x1d0
[  616.000554]  d_alloc_name+0x84/0xb0
[  616.000562]  devpts_pty_new+0x2e0/0x5e8
[  616.000568]  ptmx_open+0x14c/0x288
[  616.000576]  chrdev_open+0x194/0x408
[  616.000586]  do_dentry_open+0x2e8/0xac8
[  616.004282]  vfs_open+0x8c/0xc0
[  616.004286]  path_openat+0x694/0x33e8
[  616.004288]  do_filp_open+0x13c/0x200
[  616.004296]  do_sys_open+0x1dc/0x2e0
[  616.006865]  __arm64_sys_openat+0x88/0xc8
[  616.006872]  el0_svc_common+0x104/0x210
[  616.006876]  el0_svc_handler+0x48/0xb0
[  616.006880]  el0_svc+0x8/0xc
[  616.006881] 
[  616.006883] Freed by task 0:
[  616.006889]  __kasan_slab_free+0x114/0x228
[  616.006897]  kasan_slab_free+0x10/0x18
[  616.012068]  kmem_cache_free+0x60/0x1e8
[  616.012071]  __d_free+0x18/0x20
[  616.012081]  rcu_process_callbacks+0x46c/0x940
[  616.012086]  __do_softirq+0x28c/0x6cc
[  616.012087] 
[  616.012100] The buggy address belongs to the object at ffff8002fb33c100
[  616.012100]  which belongs to the cache dentry of size 192
[  616.017462] The buggy address is located 144 bytes inside of
[  616.017462]  192-byte region [ffff8002fb33c100, ffff8002fb33c1c0)
[  616.017465] The buggy address belongs to the page:
[  616.017470] page:ffff7e000beccf00 count:1 mapcount:0 mapping:ffff800358c13400 index:0x0 compound_mapcount: 0
[  616.017477] flags: 0x1ffff00000010200(slab|head)
[  616.017488] raw: 1ffff00000010200 dead000000000100 dead000000000200 ffff800358c13400
[  616.024873] raw: 0000000000000000 0000000080400040 00000001ffffffff 0000000000000000
[  616.024875] page dumped because: kasan: bad access detected
[  616.024876] 
[  616.024877] Memory state around the buggy address:
[  616.024882]  ffff8002fb33c080: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc
[  616.024885]  ffff8002fb33c100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  616.024887] >ffff8002fb33c180: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
[  616.024889]                          ^
[  616.024891]  ffff8002fb33c200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  616.024893]  ffff8002fb33c280: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc
[  616.024894] ==================================================================