linux-kernel - [PATCH][RFC] race in exportfs_decode

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20191109031333.GA8566@ZenIV.linux.org.uk>
Date:   Sat, 9 Nov 2019 03:13:33 +0000
From:   Al Viro <viro@...iv.linux.org.uk>
To:     "J. Bruce Fields" <bfields@...ldses.org>
Cc:     Ritesh Harjani <riteshh@...ux.ibm.com>,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
        wugyuan@...ibm.com, jlayton@...nel.org, hsiangkao@....com,
        Jan Kara <jack@...e.cz>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Christoph Hellwig <hch@....de>
Subject: [PATCH][RFC] race in exportfs_decode_fh()

On Sat, Nov 02, 2019 at 06:08:42PM +0000, Al Viro wrote:

> It is converging to a reasonably small and understandable surface, actually,
> most of that being in core pathname resolution.  Two big piles of nightmares
> left to review - overlayfs and (somewhat surprisingly) setxattr call chains,
> the latter due to IMA/EVM/LSM insanity...

Oh, lovely - in exportfs_decode_fh() we have this:
                err = exportfs_get_name(mnt, target_dir, nbuf, result);
                if (!err) {
                        inode_lock(target_dir->d_inode);
                        nresult = lookup_one_len(nbuf, target_dir,
                                                 strlen(nbuf));
                        inode_unlock(target_dir->d_inode);
                        if (!IS_ERR(nresult)) {
                                if (nresult->d_inode) {
                                        dput(result);
                                        result = nresult;
                                } else
                                        dput(nresult);
                        }
                }
We have derived the parent from fhandle, we have a disconnected dentry for child,
we go look for the name.  We even find it.  Now, we want to look it up.  And
some bastard goes and unlinks it, just as we are trying to lock the parent.
We do a lookup, and get a negative dentry.  Then we unlock the parent... and
some other bastard does e.g. mkdir with the same name.  OK, nresult->d_inode
is not NULL (anymore).  It has fuck-all to do with the original fhandle
(different inumber, etc.) but we happily accept it.

Even better, we have no barriers between our check and nresult becoming positive.
IOW, having observed non-NULL ->d_inode doesn't give us enough - e.g. we might
still see the old ->d_flags value, from back when ->d_inode used to be NULL.
On something like alpha we also have no promises that we'll observe anything
about the fields of nresult->d_inode, but ->d_flags alone is enough for fun.
The callers can't e.g. expect d_is_reg() et.al. to match the reality.

This is obviously bogus.  And the fix is obvious: check that nresult->d_inode is
equal to result->d_inode before unlocking the parent.  Note that we'd *already* had
the original result and all of its aliases rejected by the 'acceptable' predicate,
so if nresult doesn't supply us a better alias, we are SOL.

Does anyone see objections to the following patch?  Christoph, that seems to
be your code; am I missing something subtle here?  AFAICS, that goes back to
2007 or so...

Signed-off-by: Al Viro <viro@...iv.linux.org.uk>
---
diff --git a/fs/exportfs/expfs.c b/fs/exportfs/expfs.c
index 09bc68708d28..2dd55b172d57 100644
--- a/fs/exportfs/expfs.c
+++ b/fs/exportfs/expfs.c
@@ -519,26 +519,33 @@ struct dentry *exportfs_decode_fh(struct vfsmount *mnt, struct fid *fid,
 		 * inode is actually connected to the parent.
 		 */
 		err = exportfs_get_name(mnt, target_dir, nbuf, result);
-		if (!err) {
-			inode_lock(target_dir->d_inode);
-			nresult = lookup_one_len(nbuf, target_dir,
-						 strlen(nbuf));
-			inode_unlock(target_dir->d_inode);
-			if (!IS_ERR(nresult)) {
-				if (nresult->d_inode) {
-					dput(result);
-					result = nresult;
-				} else
-					dput(nresult);
-			}
+		if (err) {
+			dput(target_dir);
+			goto err_result;
 		}
 
+		inode_lock(target_dir->d_inode);
+		nresult = lookup_one_len(nbuf, target_dir, strlen(nbuf));
+		if (!IS_ERR(nresult)) {
+			if (unlikely(nresult->d_inode != result->d_inode)) {
+				dput(nresult);
+				nresult = ERR_PTR(-ESTALE);
+			}
+		}
+		inode_unlock(target_dir->d_inode);
 		/*
 		 * At this point we are done with the parent, but it's pinned
 		 * by the child dentry anyway.
 		 */
 		dput(target_dir);
 
+		if (IS_ERR(nresult)) {
+			err = PTR_ERR(nresult);
+			goto err_result;
+		}
+		dput(result);
+		result = nresult;
+
 		/*
 		 * And finally make sure the dentry is actually acceptable
 		 * to NFSD.