[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87k2xwv6y8.fsf@x220.int.ebiederm.org>
Date: Tue, 31 Mar 2015 22:54:55 -0500
From: ebiederm@...ssion.com (Eric W. Biederman)
To: Omar Sandoval <osandov@...ndov.com>
Cc: dsterba@...e.cz, Chris Mason <clm@...com>,
Josef Bacik <jbacik@...com>,
Timo Kokkonen <timo.kokkonen@...code.fi>,
linux-btrfs@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH] Btrfs: prevent deletion of mounted subvolumes
Omar Sandoval <osandov@...ndov.com> writes:
> On Mon, Mar 30, 2015 at 02:30:34PM +0200, David Sterba wrote:
>> On Mon, Mar 30, 2015 at 02:02:17AM -0700, Omar Sandoval wrote:
>> > Before commit bafc9b754f75 ("vfs: More precise tests in d_invalidate"),
>> > d_invalidate() could return -EBUSY when a dentry for a directory had
>> > more than one reference to it. This is what prevented a mounted
>> > subvolume from being deleted, as struct vfsmount holds a reference to
>> > the subvolume dentry. However, that commit removed that case, and later
>> > commits in that patch series removed the return code from d_invalidate()
>> > completely, so we don't get that check for free anymore. So, reintroduce
>> > it in btrfs_ioctl_snap_destroy().
>>
>> > This applies to 4.0-rc6. To be honest, I'm not sure that this is the most
>> > correct fix for this bug, but it's equivalent to the pre-3.18 behavior and it's
>> > the best that I could come up with. Thoughts?
>>
>> > + spin_lock(&dentry->d_lock);
>> > + err = dentry->d_lockref.count > 1 ? -EBUSY : 0;
>> > + spin_unlock(&dentry->d_lock);
>>
>> The fix restores the original behaviour, but I don't think opencoding and
>> using internals is fine. Either there should be a vfs api for that or
>> there's an existing one that can be used instead.
I have a problem with restoring the original behavior as is.
In some sense it re-introduces the security issue that the d_invalidate
changes were built to fix.
Any user in the system can create a user namespace, create a mount
namespace and keep any subvolume pinned forever. Which at the very
least could make a very nice DOS attack. I am not familiar enough with
how people use subvolumes and
So let me ask. How can userspace not know that a subvolume that they
want to delete is already mounted?
I can see having something like is_local_mount_root and denying the
subvolume destruction if the mount that is pinning it is in your local
mount namespace.
>> The bug here seems defined up to the point that we're trying to delete a
>> subvolume that's a mountpoint. My next guess is that a check
>>
>> if (d_mountpoint(&dentry)) { ... }
>>
>> could work.
>
> That was my first instinct as well, but d_mountpoint() is true for
> dentries that have a filesystem mounted on them (e.g., after mount
> /dev/sda1 /mnt, the dentry for "/mnt"), not the dentry that is mounted.
>
> I poked around the mount code for awhile and couldn't come up with
> anything using the existing interface. Mounting subvolumes bubbles down
> to mount_subtree(), which doesn't really leave any traces of which
> subvolume is mounted except for the dentry in struct vfsmount.
>
> (As far as I can tell, under the covers subvolume deletion is more or
> less equivalent to an rm -rf, and we obviously don't do anything to stop
> users from doing that on the root of their mounted filesystem, but it
> appears that users expect the original behavior.)
>
> Here's an idea: mark mount root dentries as such in the VFS and check it
> in the Btrfs code. Adding fsdevel ML for comments
> (https://lkml.org/lkml/2015/3/30/125 is the original message).
Marking root dentries is needed to fix the bug that you can escape
the limitations of loopback mounts with a carefully placed rename.
I have a patch cooking that marks mountpoints and tracks all of the
mounts on a dentry. So except for the possibility of stepping on each
others toes I have no objections.
Eric
> ----
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 74609b9..8a0933d 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -2384,6 +2384,11 @@ static noinline int btrfs_ioctl_snap_destroy(struct file *file,
> goto out_dput;
> }
>
> + if (d_is_mount_root(dentry)) {
> + err = -EBUSY;
> + goto out_dput;
> + }
> +
> mutex_lock(&inode->i_mutex);
>
> /*
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 82ef140..a28ca15 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -920,6 +920,10 @@ vfs_kern_mount(struct file_system_type *type, int flags, const char *name, void
> return ERR_CAST(root);
> }
>
> + spin_lock(&root->d_lock);
> + root->d_flags |= DCACHE_MOUNT_ROOT;
> + spin_unlock(&root->d_lock);
> +
> mnt->mnt.mnt_root = root;
> mnt->mnt.mnt_sb = root->d_sb;
> mnt->mnt_mountpoint = mnt->mnt.mnt_root;
> @@ -1017,6 +1021,8 @@ static struct mount *clone_mnt(struct mount *old, struct dentry *root,
>
> static void cleanup_mnt(struct mount *mnt)
> {
> + struct dentry *root = mnt->mnt.mnt_root;
> +
> /*
> * This probably indicates that somebody messed
> * up a mnt_want/drop_write() pair. If this
> @@ -1031,7 +1037,10 @@ static void cleanup_mnt(struct mount *mnt)
> if (unlikely(mnt->mnt_pins.first))
> mnt_pin_kill(mnt);
> fsnotify_vfsmount_delete(&mnt->mnt);
> - dput(mnt->mnt.mnt_root);
> + spin_lock(&root->d_lock);
> + root->d_flags &= ~DCACHE_MOUNT_ROOT;
> + spin_unlock(&root->d_lock);
> + dput(root);
> deactivate_super(mnt->mnt.mnt_sb);
> mnt_free_id(mnt);
> call_rcu(&mnt->mnt_rcu, delayed_free_vfsmnt);
> diff --git a/include/linux/dcache.h b/include/linux/dcache.h
> index d835879..d974ab8 100644
> --- a/include/linux/dcache.h
> +++ b/include/linux/dcache.h
> @@ -225,6 +225,7 @@ struct dentry_operations {
>
> #define DCACHE_MAY_FREE 0x00800000
> #define DCACHE_FALLTHRU 0x01000000 /* Fall through to lower layer */
> +#define DCACHE_MOUNT_ROOT 0x02000000 /* is the root of a mount */
>
> extern seqlock_t rename_lock;
>
> @@ -401,6 +402,16 @@ static inline bool d_mountpoint(const struct dentry *dentry)
> return dentry->d_flags & DCACHE_MOUNTED;
> }
>
> +static inline bool d_is_mount_root(const struct dentry *dentry)
> +{
> + bool ret;
> +
> + spin_lock(&dentry->d_lock);
> + ret = dentry->d_flags & DCACHE_MOUNT_ROOT;
> + spin_unlock(&dentry->d_lock);
> + return ret;
> +}
> +
> /*
> * Directory cache entry type accessor functions.
> */
> ----
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists