[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87bo1u8vmf.fsf@xmission.com>
Date: Fri, 08 Nov 2013 12:51:52 -0800
From: ebiederm@...ssion.com (Eric W. Biederman)
To: Al Viro <viro@...IV.linux.org.uk>
Cc: Miklos Szeredi <miklos@...redi.hu>,
Andy Lutomirski <luto@...capital.net>,
"Serge E. Hallyn" <serge@...lyn.com>,
Linux-Fsdevel <linux-fsdevel@...r.kernel.org>,
Kernel Mailing List <linux-kernel@...r.kernel.org>,
Rob Landley <rob@...dley.net>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Matthias Schniedermeyer <ms@...d.de>,
Linux Containers <containers@...ts.linux-foundation.org>
Subject: Re: [REVIEW][PATCH 1/4] vfs: Don't allow overwriting mounts in the current mount namespace
Al Viro <viro@...IV.linux.org.uk> writes:
> On Tue, Oct 15, 2013 at 01:16:48PM -0700, Eric W. Biederman wrote:
>
>> int vfs_rmdir(struct inode *dir, struct dentry *dentry)
>> {
>> int error = may_delete(dir, dentry, 1);
>> @@ -3622,6 +3636,9 @@ retry:
>> error = -ENOENT;
>> goto exit3;
>> }
>> + error = -EBUSY;
>> + if (covered(nd.path.mnt, dentry))
>> + goto exit3;
>
> Ugh... And it's not racy because of...? IOW, what's to keep the return
> value of covered() from getting obsolete just as it's being calculated,
> let alone returned?
The return value of d_mountpoint can be obsolete as soon as it returns
as well, so I don't see this as being significantly different.
I would like to say that any changes introduced here do not matter
because all of this is just to keep a semblance of the old semantics.
Unfortunately for me part of keeping that semblance is as much as is
reasonable preserving the existing race guarantees.
In 3.12 we create a mount with:
- The dentry->d_inode mutex held.
- The namespace_sem held.
In 3.12 we remove a mount with just the namespace_sem held.
I call covered in: do_rmdir, do_unlinkat, and renameat.
In 3.12 vfs_rmdir checks d_mountpoint with the
dentry->d_inode->i_mutex and
dentry->d_parent->d_inode->i_mutex held.
In 3.12 vfs_unlink checks d_mountpoint with the
dentry->d_inode->i_mutex and
dentry->d_parent->d_inode->i_mutex hel.d
In 3.12 vfs_rename_dir and vfs_rename_other checks d_mountpint with the
target->i_mutex, new_dir->i_mutex, and old_dir->i_mutex held.
Therefore the guarantees in 3.12 are:
- unlink versus mount races are prevented by the
dentry->d_inode->i_mutex of the dentry being removed.
- unlink versus umount races are uninteresting.
- mount versus rename races in testing of d_mountpoint are ignored.
- umount versus rename races in testing of d_mountpoint are ignored.
So comparing this to how I have implemented covered the test is at a
slightly different location in the call path so there may be a slightly
larger race in rename.
For unlink there is a race where the mount could happen after testing
covered. Then the unlink happens. Then we remove the mount with
detach_mounts.
In the context of the symlink attacks against umounting of fuse I don't
see a difference.
In the only case where there is a new race (unlink versus mount) I see
a narrow window where new behavior will happen the unlink will win and
we unmount the filesystem. So there is a vary narrow window in which
we might have a stale entry in /etc/mtab.
So after all of that analysis I don't think we care. If we do care with
a little more work we can pass the mountpoint down and test covered with
dentry->d_inode->i_mutex held, where we test d_mountpoint in 3.12 today.
Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists