[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230915-elstern-etatplanung-906c6780af19@brauner>
Date: Fri, 15 Sep 2023 11:44:55 +0200
From: Christian Brauner <brauner@...nel.org>
To: Al Viro <viro@...iv.linux.org.uk>
Cc: Christoph Hellwig <hch@....de>, Heiko Carstens <hca@...ux.ibm.com>,
Vasily Gorbik <gor@...ux.ibm.com>,
Alexander Gordeev <agordeev@...ux.ibm.com>,
Fenghua Yu <fenghua.yu@...el.com>,
Reinette Chatre <reinette.chatre@...el.com>,
Miquel Raynal <miquel.raynal@...tlin.com>,
Richard Weinberger <richard@....at>,
Vignesh Raghavendra <vigneshr@...com>,
Dennis Dalessandro <dennis.dalessandro@...nelisnetworks.com>,
Tejun Heo <tj@...nel.org>,
Trond Myklebust <trond.myklebust@...merspace.com>,
Anna Schumaker <anna@...nel.org>,
Kees Cook <keescook@...omium.org>,
Damien Le Moal <dlemoal@...nel.org>,
Naohiro Aota <naohiro.aota@....com>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
linux-usb@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-s390@...r.kernel.org, linux-rdma@...r.kernel.org,
linux-nfs@...r.kernel.org, linux-hardening@...r.kernel.org,
cgroups@...r.kernel.org, Jan Kara <jack@...e.cz>
Subject: Re: [PATCH 03/19] fs: release anon dev_t in deactivate_locked_super
On Thu, Sep 14, 2023 at 05:58:05PM +0100, Al Viro wrote:
> On Thu, Sep 14, 2023 at 04:02:25PM +0200, Christian Brauner wrote:
>
> > Yes, you're right that making the superblock and not the filesytem type
> > the bd_holder changes the logic and we are aware of that of course. And
> > it requires changes such as moving additional block device closing from
> > where some callers currently do it.
>
> Details, please?
Filesystems like xfs and ext4 that closed additional block devices (For
example, the logdev= mount option for xfs.) in put_super() could go
through stuff like:
blkdev_put()
-> bdev->bd_disk->fops->release() == lo_release()
-> __loop_clr_fd()
-> disk_force_media_change()
-> __invalidate_device()
-> get_super()
which wouldn't have been a problem before because get_super() matched on
sb->s_bdev which obviously doesn't work because a log device or rt
device or whatever isn't the main block device. So we couldn't have
deadlocked.
But the fact that it is called in that manner from that place in the
first place is wildly adventurous especially considering that there
isn't __a single comment__ in that code why that is safe. So good luck
figuring this all out.
Now that we don't have to do that s_bdev matching thing anymore because
we directly associate the superblock with the block device we can go
straight from block device to superblock. But now calling blkdev_put()
under put_super() which holds s_umount could deadlock. So it's moved to
kill_sb where it should've always been called. Even without the
potential deadlock in the new scheme that's cleaner and easier to
understand imho and it just works for any block device.
> Note that Christoph's series has mashed (2) and (3) together, resulting
> in UAF in a bunch of places. And I'm dead serious about
Yes, that I did fix as far as I'm aware. If the rules would've been
written down where when something was freed we would've had an easier
time figuring this out though. But they weren't so we missed it.
> Documentation/filesystems/porting being the right place; any development
Yes, agreed. I'll write a document for Christoph's next version.
I know that what you're saying is roughly that we shouldn't make the
same mistake as were done before but the fact that the old lifetime
rules weren't documented in any meaningful way and now we get grumbled
at in turn makes me grumble a bit. :) But overall point duly taken.
> tree of any filesystem (in-tree one or not) will have to go through the
> changes and figure out WTF to do with their existing code. We are
> going to play whack-a-mole for at least several years as development
> branches get rebased and merged.
Let me write something up.
>
> Incidentally, I'm going to add a (belated by 10 years) chunk in porting.rst
> re making sure that anything in superblock that might be needed by methods
> called in RCU mode should *not* be freed without an RCU delay... Should've
> done that back in 3.12 merge window when RCU'd vfsmounts went in; as it
> is, today we have several filesystems with exact same kind of breakage.
> hfsplus and affs breakage had been there in 3.13 (missed those two), exfat
> and ntfs3 - introduced later, by initial merges of filesystems in question.
> Missed on review...
Cool, thanks for adding that.
Powered by blists - more mailing lists