[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wg4iJsMHBzK52WzP+5_92HbwvX_vh_s4mMUuN0FJGdM5A@mail.gmail.com>
Date: Mon, 25 Mar 2019 11:36:01 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Al Viro <viro@...iv.linux.org.uk>
Cc: syzbot <syzbot+7a8ba368b47fdefca61e@...kaller.appspotmail.com>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
linux-fsdevel <linux-fsdevel@...r.kernel.org>,
Linux List Kernel Mailing <linux-kernel@...r.kernel.org>,
syzkaller-bugs <syzkaller-bugs@...glegroups.com>
Subject: Re: KASAN: use-after-free Read in path_lookupat
On Sun, Mar 24, 2019 at 9:57 PM Al Viro <viro@...iv.linux.org.uk> wrote:
>
> So we have 5 broken cases, all with the same kind of fix: move freeing
> into the RCU-delayed part of ->destroy_inode(); for debugfs and bpf
> that requires adding ->alloc_inode()/->destroy_inode(), rather than
> relying upon the defaults from fs/inode.c
The fact that we had five different broken filesystems does imply that
we should just make this much easier at the VFS layer instead of
forcing filesystems to cope with a bad interface.
> > Or maybe we could add a final inode callback function for "rcu_free"
> > that is called as the RCU-delayed freeing of the inode itself happens?
> > And then people could hook into that for freeing the inode->i_link
> > data.
>
> You mean, split ->destroy_inode() into immediate and RCU-delayed parts?
> There are filesystems where both parts are non-empty - we can't just
> switch all ->destroy_inode() work to call_rcu().
Right. Not just move the existing destroy_inode() - because as you
say, people may not be able to to do that in RCU contect, but split it
up, and add a "final_free_inode()" callback or something for the RCU
phase.
In fact, I suspect that *every* user of destroy_inode() already has to
have its own RCU callback to another final freeing function anyway.
Because they really shouldn't free the inode itself early. Maybe we
can just make that be a generic thing?
And no, it's not like the patch to the bpf filesystem looks
_horrible_, but I think the fact that the low-level filesystem needed
to do its own RCU-deferred thing (even if it wasn't doing any RCU on
its own, although bpf obviously has that in other areas) is kind of
wrong, and shows that our vfs interfaces are a bit awkward. Plus it's
not like it was just bpf that got this wrong.
It's also somewhat silly to add _another_ RCU callback when we already
have one for the inode.
Linus
Powered by blists - more mailing lists