[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190325230211.GR2217@ZenIV.linux.org.uk>
Date: Mon, 25 Mar 2019 23:02:11 +0000
From: Al Viro <viro@...iv.linux.org.uk>
To: Dave Chinner <david@...morbit.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
syzbot <syzbot+7a8ba368b47fdefca61e@...kaller.appspotmail.com>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
linux-fsdevel <linux-fsdevel@...r.kernel.org>,
Linux List Kernel Mailing <linux-kernel@...r.kernel.org>,
syzkaller-bugs <syzkaller-bugs@...glegroups.com>,
Jan Kara <jack@...e.cz>, Jaegeuk Kim <jaegeuk@...nel.org>,
Joel Becker <jlbec@...lplan.org>, Mark Fasheh <mark@...heh.com>
Subject: Re: KASAN: use-after-free Read in path_lookupat
On Tue, Mar 26, 2019 at 09:48:23AM +1100, Dave Chinner wrote:
> And when it comes to VFS inode reclaim, XFS does not implement
> ->evict_inode because there is nothing at the VFS level to do.
> And ->destroy_inode ends up doing cleanup work (e.g. freeing on-disk
> inodes) which is non-trivial, blocking work, but then still requires
> the struct xfs_inode to be written back to disk before it can bei
> freed. So it just gets marked "reclaimable" and background reclaim
> then takes care of it from there so we avoid synchronous IO in inode
> reclaim...
>
> This works because don't track dirty inode metadata in the VFS
> writeback code (it's tracked with much more precision in the XFS log
> infrastructure) and we don't write back inodes from the VFS
> infrastructure, either. It's all done based on internal state
> outside the VFS.
>
> And, because of this, the VFS cannot assume that it can free
> the struct inode after calling ->destroy_inode or even use
> call_rcu() to run a filesystem destructor because the filesystem
> may need to do work that needs to block and that's not allowed in an
> RCU callback...
In Linus' patch that's what you get with non-NULL ->destroy_inode
+ NULL ->destroy_inode_rcu, so XFS won't be screwed by that.
Said that, yes, XFS adds another fun twist there (AFAICS, it's
the only in-tree filesystem that pulls that off).
I would really like some comments from f2fs and ocfs2 folks, as well
as Jan - he's had much more recent contact with writeback code than
I have... Could somebody explain what's going on in f2fs and ocfs2
->drop_inode()? It _should_ be just a predicate; looks like both
are playing very odd games to work around writeback problems and
I wonder if there's a cleaner solution for that. I can try and dig
through maillist(s) archives, but I would really appreciate it
if somebody could give a braindump on the issues dealt with in there...
Powered by blists - more mailing lists