linux-kernel - Re: [git pull] vfs fixes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87inmmvxlz.fsf@xmission.com>
Date:   Mon, 03 Apr 2017 01:00:56 -0500
From:   ebiederm@...ssion.com (Eric W. Biederman)
To:     Al Viro <viro@...IV.linux.org.uk>
Cc:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>,
        Ian Kent <raven@...maw.net>
Subject: Re: [git pull] vfs fixes

Al Viro <viro@...IV.linux.org.uk> writes:

> On Sun, Apr 02, 2017 at 05:58:41PM -0700, Linus Torvalds wrote:
>
>> I had to go and double-check that "DCACHE_DIRECTORY_TYPE" is what
>> d_can_lookup() actually checks, so _that_ part is perhaps a bit
>> subtle, and might be worth noting in that comment that you edited.
>> 
>> So the real "rule" ends up being that we only ever look up things from
>> dentries of type DCACHE_DIRECTORY_TYPE set, and those had better have
>> DCACHE_RCUACCESS bit set.
>> 
>> And the only reason path_init() only checks it for that case is that
>> nd->root and nd->pwd both have to be of type d_can_lookup().
>> 
>> Do we check that when we set it? I hope/assume we do.
>
> For chdir()/chroot()/pivot_root() it's done by LOOKUP_DIRECTORY in lookup
> flags; fchdir() is slightly different - there we check S_ISDIR of inode
> of opened file.  Which is almost the same thing, except for
> kinda-sorta directories that have no ->lookup() - we have them for
> NFS referral points.  It should be impossible to end up with
> one of those opened - not even with O_PATH; follow_managed() will be called
> and we'll either fail or cross into whatever ends up overmounting them.
> Still, it might be cleaner to turn that check into
> 	d_can_lookup(f.file->f_path.dentry)
> simply for consistency sake.
>
> The thing I really don't like is mntns_install().  With sufficiently
> nasty nfsroot setup it might be possible to end up with referral point
> as one's root/pwd; getting out of such state would be interesting...
> Smells like that place should be a solitary follow_down(), not a loop
> of follow_down_one(), but I want Eric's opinion on that one; userns stuff
> is weird.

If I read the conversation correctly the concern is that we might
initialize a pwd or root with something that is almost but not quite a
directory in mntns_install.

Refereshing my memory.  d_automount mounts things and is what
is used for nfs referrals.  d_manage blocks waiting for
an automounts to complete or expire.  follow_down just calls d_manage,
follow_manage calls both d_manage and d_automount as appropriate.

If the concern is nfs referral points calling follow_down is wrong and
what is wanted is follow_managed.

The only thing that follow_down prevents is changing onto directories
that are only half mounted, and not really directories yet.  Which
is certainly part of the invarient we want to preserve.



The intent of the logic in mntns_install is to just pick a reasonable
looking place somewhere in that mount namespace to use as a root
directory.  I arbitrarily picked the top of the mount stack on "/".  Which
is typically used as the root directory.  If people really care where
their root is they save a directory file descriptor off somewhere and
call chroot.  So there is a little wiggle room exactly what the code
does.

There is a secondary use of mntns_install which is to give you a way to
access what is under "/" if you are so foolish as to umount "/".  I keep
thinking setns to your own mount namespace would be a handy way to get
back to the rootfs and to use it for something during system shutdown.
I don't know if anyone has actually used setns to your own mount
namespace for that.

The worst case I can see from the proposed change is we would
not be able to umount all of the way down to rootfs.  That
would be a self inflicted wound so I don't care.

I can't imagine anyone mounting an automount point deliberately on /
except as way to confuse the vfs.  Though I can almost imagine getting
there by accident if an automount expires.

So yes please let's change the follow_down_one loop to follow_managed
to preserve the invariant that we always have a directory that
supports d_can_lookup to pass to set_fs_pwd and set_fs_root.

Eric

> diff --git a/fs/dcache.c b/fs/dcache.c
> index 95d71eda8142..05550139a8a6 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -1757,7 +1757,13 @@ static unsigned d_flags_for_inode(struct inode *inode)
>  		return DCACHE_MISS_TYPE;
>  
>  	if (S_ISDIR(inode->i_mode)) {
> -		add_flags = DCACHE_DIRECTORY_TYPE;
> +		/*
> +		 * Any potential starting point of lookup should have
> +		 * DCACHE_RCUACCESS; currently directory dentries
> +		 * come from d_alloc() anyway, but it costs us nothing
> +		 * to enforce it here.
> +		 */
> +		add_flags = DCACHE_DIRECTORY_TYPE | DCACHE_RCUACCESS;
>  		if (unlikely(!(inode->i_opflags & IOP_LOOKUP))) {
>  			if (unlikely(!inode->i_op->lookup))
>  				add_flags = DCACHE_AUTODIR_TYPE;
> diff --git a/fs/namei.c b/fs/namei.c
> index d41fab78798b..19dcf62133cc 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -2145,6 +2145,9 @@ static const char *path_init(struct nameidata *nd, unsigned flags)
>  	int retval = 0;
>  	const char *s = nd->name->name;
>  
> +	if (!*s)
> +		flags &= ~LOOKUP_RCU;
> +
>  	nd->last_type = LAST_ROOT; /* if there are only slashes... */
>  	nd->flags = flags | LOOKUP_JUMPED | LOOKUP_PARENT;
>  	nd->depth = 0;
> diff --git a/fs/namespace.c b/fs/namespace.c
> index cc1375eff88c..31ec9a79d2d4 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -3467,6 +3467,7 @@ static int mntns_install(struct nsproxy *nsproxy, struct ns_common *ns)
>  	struct fs_struct *fs = current->fs;
>  	struct mnt_namespace *mnt_ns = to_mnt_ns(ns);
>  	struct path root;
> +	int err;
>  
>  	if (!ns_capable(mnt_ns->user_ns, CAP_SYS_ADMIN) ||
>  	    !ns_capable(current_user_ns(), CAP_SYS_CHROOT) ||
> @@ -3484,15 +3485,14 @@ static int mntns_install(struct nsproxy *nsproxy, struct ns_common *ns)
>  	root.mnt    = &mnt_ns->root->mnt;
>  	root.dentry = mnt_ns->root->mnt.mnt_root;
>  	path_get(&root);
> -	while(d_mountpoint(root.dentry) && follow_down_one(&root))
> -		;
> -
> -	/* Update the pwd and root */
> -	set_fs_pwd(fs, &root);
> -	set_fs_root(fs, &root);
> -
> +	err = follow_down(&root);
> +	if (likely(!err)) {
> +		/* Update the pwd and root */
> +		set_fs_pwd(fs, &root);
> +		set_fs_root(fs, &root);
> +	}
>  	path_put(&root);
> -	return 0;
> +	return err;
>  }
>  
>  static struct user_namespace *mntns_owner(struct ns_common *ns)
> diff --git a/fs/open.c b/fs/open.c
> index 949cef29c3bb..217b5db588c8 100644
> --- a/fs/open.c
> +++ b/fs/open.c
> @@ -459,20 +459,17 @@ SYSCALL_DEFINE1(chdir, const char __user *, filename)
>  SYSCALL_DEFINE1(fchdir, unsigned int, fd)
>  {
>  	struct fd f = fdget_raw(fd);
> -	struct inode *inode;
>  	int error = -EBADF;
>  
>  	error = -EBADF;
>  	if (!f.file)
>  		goto out;
>  
> -	inode = file_inode(f.file);
> -
>  	error = -ENOTDIR;
> -	if (!S_ISDIR(inode->i_mode))
> +	if (!d_can_lookup(f.file->f_path.dentry))
>  		goto out_putf;
>  
> -	error = inode_permission(inode, MAY_EXEC | MAY_CHDIR);
> +	error = inode_permission(file_inode(f.file), MAY_EXEC | MAY_CHDIR);
>  	if (!error)
>  		set_fs_pwd(current->fs, &f.file->f_path);
>  out_putf: