[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210405170801.zrdhnon6g4ggb6c7@wittgenstein>
Date: Mon, 5 Apr 2021 19:08:01 +0200
From: Christian Brauner <christian.brauner@...ntu.com>
To: Al Viro <viro@...iv.linux.org.uk>
Cc: Jens Axboe <axboe@...nel.dk>,
syzbot <syzbot+c88a7030da47945a3cc3@...kaller.appspotmail.com>,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
syzkaller-bugs@...glegroups.com, io-uring@...r.kernel.org
Subject: Re: [syzbot] WARNING in mntput_no_expire (2)
On Mon, Apr 05, 2021 at 04:18:58PM +0000, Al Viro wrote:
> On Mon, Apr 05, 2021 at 01:44:37PM +0200, Christian Brauner wrote:
> > On Sun, Apr 04, 2021 at 08:17:21PM +0000, Al Viro wrote:
> > > On Sun, Apr 04, 2021 at 06:50:10PM +0000, Al Viro wrote:
> > >
> > > > > Yeah, I have at least namei.o
> > > > >
> > > > > https://drive.google.com/file/d/1AvO1St0YltIrA86DXjp1Xg3ojtS9owGh/view?usp=sharing
> > > >
> > > > *grumble*
> > > >
> > > > Is it reproducible without KASAN? Would be much easier to follow the produced
> > > > asm...
> > >
> > > Looks like inode_permission(_, NULL, _) from may_lookup(nd). I.e.
> > > nd->inode == NULL.
> >
> > Yeah, I already saw that.
> >
> > >
> > > Mind slapping BUG_ON(!nd->inode) right before may_lookup() call in
> > > link_path_walk() and trying to reproduce that oops?
> >
> > Yep, no problem. If you run the reproducer in a loop for a little while
> > you eventually trigger the BUG_ON() and then you get the following splat
> > (and then an endless loop) in [1] with nd->inode NULL.
> >
> > _But_ I managed to debug this further and was able to trigger the BUG_ON()
> > directly in path_init() in the AT_FDCWD branch (after all its AT_FDCWD(./file0)
> > with the patch in [3] (it's in LOOKUP_RCU) the corresponding splat is in [2].
> > So the crash happens for a PF_IO_WORKER thread with a NULL nd->inode for the
> > PF_IO_WORKER's pwd (The PF_IO_WORKER seems to be in async context.).
>
> So we find current->fs->pwd.dentry negative, with current->fs->seq sampled
> equal before and after that? Lovely... The only places where we assign
> anything to ->pwd.dentry are
> void set_fs_pwd(struct fs_struct *fs, const struct path *path)
> {
> struct path old_pwd;
>
> path_get(path);
> spin_lock(&fs->lock);
> write_seqcount_begin(&fs->seq);
> old_pwd = fs->pwd;
> fs->pwd = *path;
> write_seqcount_end(&fs->seq);
> spin_unlock(&fs->lock);
>
> if (old_pwd.dentry)
> path_put(&old_pwd);
> }
> where we have ->seq bumped between dget new/assignment/ dput old,
> copy_fs_struct() where we have
> spin_lock(&old->lock);
> fs->root = old->root;
> path_get(&fs->root);
> fs->pwd = old->pwd;
> path_get(&fs->pwd);
> spin_unlock(&old->lock);
> fs being freshly allocated instance that couldn't have been observed
> by anyone and chroot_fs_refs(), where we have
> spin_lock(&fs->lock);
> write_seqcount_begin(&fs->seq);
> hits += replace_path(&fs->root, old_root, new_root);
> hits += replace_path(&fs->pwd, old_root, new_root);
> write_seqcount_end(&fs->seq);
> while (hits--) {
> count++;
> path_get(new_root);
> }
> spin_unlock(&fs->lock);
> ...
> static inline int replace_path(struct path *p, const struct path *old, const struct path *new)
> {
> if (likely(p->dentry != old->dentry || p->mnt != old->mnt))
> return 0;
> *p = *new;
> return 1;
> }
> Here we have new_root->dentry pinned from the very beginning,
> and assignments are wrapped into bumps of ->seq. Moreover,
> we are holding ->lock through that sequence (as all writers
> do), so these references can't be dropped before path_get()
> bumps new_root->dentry refcount.
>
> chroot_fs_refs() is called only by pivot_root(2):
> chroot_fs_refs(&root, &new);
> and there new is set by
> error = user_path_at(AT_FDCWD, new_root,
> LOOKUP_FOLLOW | LOOKUP_DIRECTORY, &new);
> if (error)
> goto out0;
> which pins new.dentry *and* verifies that it's positive and a directory,
> at that. Since pinned positive dentry can't be made negative by anybody
> else, we know it will remain in that state until
> path_put(&new);
> well downstream of chroot_fs_refs(). In copy_fs_struct() we are
> copying someone's ->pwd, so it's also pinned positive. And it
> won't be dropped outside of old->lock, so by the time somebody
> manages to drop the reference in old, path_get() effects will be
> visible (old->lock serving as a barrier).
>
> That leaves set_fs_pwd() calls:
> fs/init.c:54: set_fs_pwd(current->fs, &path);
> init_chdir(), path set by LOOKUP_DIRECTORY patwalk. Pinned positive.
> fs/namespace.c:4207: set_fs_pwd(current->fs, &root);
> init_mount_tree(), root.dentry being ->mnt_root of rootfs. Pinned
> positive (and it would've oopsed much earlier had that been it)
> fs/namespace.c:4485: set_fs_pwd(fs, &root);
> mntns_install(), root filled by successful LOOKUP_DOWN for "/"
> from mnt_ns->root. Should be pinned positive.
> fs/open.c:501: set_fs_pwd(current->fs, &path);
> chdir(2), path set by LOOKUP_DIRECTORY pathwalk. Pinned positive.
> fs/open.c:528: set_fs_pwd(current->fs, &f.file->f_path);
> fchdir(2), file->f_path of any opened file. Pinned positive.
> kernel/usermode_driver.c:130: set_fs_pwd(current->fs, &umd_info->wd);
> umd_setup(), ->wd.dentry equal to ->wd.mnt->mnt_root, should be pinned positive.
> kernel/nsproxy.c:509: set_fs_pwd(me->fs, &nsset->fs->pwd);
> commit_nsset(). Let's see what's going on there...
>
> if ((flags & CLONE_NEWNS) && (flags & ~CLONE_NEWNS)) {
> set_fs_root(me->fs, &nsset->fs->root);
> set_fs_pwd(me->fs, &nsset->fs->pwd);
> }
> In those conditions nsset.fs has come from copy_fs_struct() done in
> prepare_nsset(). And the only thing that might've been done to it
> would be those set_fs_pwd() in mntns_install() (I'm not fond of the
> entire nsset->fs thing - looks like papering over bad calling
> conventions, but anyway)
>
> Now, I might've missed some insanity (direct assignments to ->pwd.dentry,
> etc. - wouldn't be the first time io_uring folks went "layering? wassat?
> we'll just poke in whatever we can reach"), but I don't see anything
> obvious of that sort in the area...
>
> OK, how about this: in path_init(), right after
> do {
> seq = read_seqcount_begin(&fs->seq);
> nd->path = fs->pwd;
> nd->inode = nd->path.dentry->d_inode;
> nd->seq = __read_seqcount_begin(&nd->path.dentry->d_seq);
> } while (read_seqcount_retry(&fs->seq, seq));
> slap
> if (!nd->inode) {
> // should never, ever happen
> struct dentry *fucked = nd->path.dentry;
> printk(KERN_ERR "%pd4 %d %x %p %d %d", fucked, d_count(fucked),
> fucked->d_flags, fs, fs->users, seq);
> BUG_ON(1);
> return ERR_PTR(-EINVAL);
> }
> and see what it catches?
Ah dentry count of -127 looks... odd.
[ 246.102077] /newroot/foo -127 18008 ffff888012819000 6 0
[ 246.102240] ------------[ cut here ]------------
[ 246.102264] /newroot/foo -127 18008 ffff888012819000 6 6
[ 246.104163] ------------[ cut here ]------------
[ 246.104943] kernel BUG at fs/namei.c:2359!
[ 246.106342] kernel BUG at fs/namei.c:2359!
[ 246.106385] invalid opcode: 0000 [#1] PREEMPT SMP KASAN
[ 246.110540] CPU: 0 PID: 6345 Comm: uring_viro Tainted: G W E 5.12.0-rc5-1ebc00aa82b08217d1fc4eef5435f8499783194c #53
[ 246.113725] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009)/LXD, BIOS 0.0.0 02/06/2015
[ 246.116115] RIP: 0010:path_init.cold+0xbb/0xea
[ 246.117711] Code: d0 7c 04 84 d2 75 4b 4c 8b 45 98 41 89 d9 44 89 f9 4c 89 e6 41 8b 94 24 d0 00 00 00 48 c7 c7 20 93 1a b1 41 55 e8 1c 70 fe ff <0f> 0b 48 8b 7d b8 e8 1f ee e6 f8 e9 55 ff ff ff 48 8b 7d 98 e8 01
[ 246.124372] RSP: 0018:ffffc900073275f0 EFLAGS: 00010282
[ 246.126466] RAX: 000000000000002c RBX: 0000000000000006 RCX: 0000000000000000
[ 246.129685] RDX: 0000000000000000 RSI: ffff88801354d700 RDI: fffff52000e64eb0
[ 246.131400] RBP: ffffc900073276a0 R08: 000000000000002c R09: ffffed1002b46045
[ 246.133241] R10: ffff888015a30227 R11: ffffed1002b46044 R12: ffff8880303eb028
[ 246.135124] R13: 0000000000000006 R14: ffffc90007327820 R15: 0000000000018008
[ 246.136931] FS: 00007f8695724800(0000) GS:ffff888015a00000(0000) knlGS:0000000000000000
[ 246.139247] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 246.141604] CR2: 000055fdf3c11008 CR3: 000000002e9dd000 CR4: 0000000000350ef0
[ 246.143437] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 246.145337] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 246.147114] Call Trace:
[ 246.147910] ? write_comp_data+0x2a/0x90
[ 246.149010] path_openat+0x192/0x2790
[ 246.150062] ? path_lookupat.isra.0+0x530/0x530
[ 246.151295] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 246.152561] ? lockdep_hardirqs_on_prepare+0x400/0x400
[ 246.154071] do_filp_open+0x197/0x270
[ 246.155157] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 246.156392] ? may_open_dev+0xf0/0xf0
[ 246.157969] ? do_raw_spin_lock+0x125/0x2e0
[ 246.159167] ? write_comp_data+0x2a/0x90
[ 246.160319] ? __sanitizer_cov_trace_pc+0x1d/0x50
[ 246.161649] ? _raw_spin_unlock+0x29/0x40
[ 246.162823] ? alloc_fd+0x499/0x640
[ 246.164092] io_openat2+0x1d1/0x8f0
[ 246.165403] ? io_req_complete_post+0xa90/0xa90
[ 246.166974] ? __lock_acquire+0x1847/0x5850
[ 246.168455] ? write_comp_data+0x2a/0x90
[ 246.169877] io_issue_sqe+0x2a2/0x5ac0
[ 246.171226] ? lockdep_hardirqs_on_prepare+0x400/0x400
[ 246.173079] ? io_poll_complete.constprop.0+0x100/0x100
[ 246.174960] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 246.176468] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 246.187303] ? find_held_lock+0x2d/0x110
[ 246.197951] ? __might_fault+0xd8/0x180
[ 246.208458] __io_queue_sqe+0x19f/0xcf0
[ 246.218694] ? __check_object_size+0x1b4/0x4e0
[ 246.228802] ? __ia32_sys_io_uring_setup+0x70/0x70
[ 246.239099] ? write_comp_data+0x2a/0x90
[ 246.249152] io_queue_sqe+0x612/0xb70
[ 246.258967] io_submit_sqes+0x517d/0x6650
[ 246.268445] ? __x64_sys_io_uring_enter+0xb15/0xdd0
[ 246.282682] __x64_sys_io_uring_enter+0xb15/0xdd0
[ 246.292110] ? __ia32_sys_io_uring_enter+0xdd0/0xdd0
[ 246.301285] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 246.310116] ? syscall_enter_from_user_mode+0x27/0x70
[ 246.318845] do_syscall_64+0x2d/0x70
[ 246.327328] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 246.336188] RIP: 0033:0x7f869583a67d
[ 246.344145] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d bb f7 0c 00 f7 d8 64 89 01 48
[ 246.367649] RSP: 002b:000055fdf2a89e98 EFLAGS: 00000212 ORIG_RAX: 00000000000001aa
[ 246.377100] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f869583a67d
[ 246.386119] RDX: 0000000000000000 RSI: 00000000000045f5 RDI: 0000000000000003
[ 246.395095] RBP: 000055fdf2a89f70 R08: 0000000000000000 R09: 0000000000000000
[ 246.403925] R10: 0000000000000000 R11: 0000000000000212 R12: 000055fdf295c640
[ 246.412977] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 246.425046] Modules linked in: efi_pstore(E) efivarfs(E)
[ 246.434681] invalid opcode: 0000 [#2] PREEMPT SMP KASAN
[ 246.435099] ---[ end trace b331351bc5a092fa ]---
Powered by blists - more mailing lists