linux-kernel - Re: Q: NFSD readdir in linux-2.6.28

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <1240006620.19059.41.camel@macbook.infradead.org>
Date:	Fri, 17 Apr 2009 23:17:00 +0100
From:	David Woodhouse <dwmw2@...radead.org>
To:	Al Viro <viro@...IV.linux.org.uk>
Cc:	hooanon05@...oo.co.jp, hch@...radead.org,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>
Subject: Re: Q: NFSD readdir in linux-2.6.28

On Fri, 2009-04-17 at 20:32 +0100, Al Viro wrote:
> Ow...  Sorry, I've missed that one.  I really don't like flags-based
> solution ;-/  It's not just filesystem method that want i_mutex there -
> we have fs/namei.c code suddenly called in different locking conditions
> now.

Well, we have that already.

When nfsd readdirplus code ends up calling back into lookup_one_len(),
it does so without i_mutex held. XFS had been doing it that way for
years, so that detail escaped our notice when we shifted the XFS 'hack'
into nfsd code.

> What were the details of xfs and jffs2 locking problems, just in case
> if something can be done in that direction short-term?

JFFS2 has its own internal mutex protecting the directory. It needs an
internal mutex instead of i_mutex because of ordering issues with
garbage collection. We are _in_ jffs2_readdir(), with the lock held,
when we call back into nfsd's filldir function... which calls right back
into jffs2_lookup() and deadlocks when it tries to take the same lock
again.

The situation in btrfs and xfs is broadly similar. They have their own
locks, and we end up deadlocking. GFS2 has some lovely "is this process
the owner of the lock" magic to avoid a similar deadlock.

It sounds like the better answer is to just make sure i_mutex is held
when nfsd_buffered_readdir() calls back into the provided filldir
function (we could do it in the various filldir functions themselves,
_if_ they call lookup_one_len(), but I think I prefer it this way --
it's simpler). Patch below for comment.

(While I'm staring at it, it looks like nfsd_buffered_readdir() should
be returning a __be32 not an int, and its 'return -ENOMEM' should be
'return nfserrno(-ENOMEM)'. The first bug I inherited from the existing
nfsd_do_readdir() when I replaced it, but the second is all my own. I'll
send a patch to fix those shortly.)

diff --git a/fs/namei.c b/fs/namei.c
index b8433eb..78f253c 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1248,6 +1248,8 @@ struct dentry *lookup_one_len(const char *name, struct dentry *base, int len)
 	int err;
 	struct qstr this;

+	WARN_ON_ONCE(!mutex_is_locked(&base->d_inode->i_mutex));
+
 	err = __lookup_one_len(name, &this, base, len);
 	if (err)
 		return ERR_PTR(err);
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index ab93fcf..fb2965d 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1920,13 +1920,27 @@ static int nfsd_buffered_readdir(struct file *file, filldir_t func,
 			break;

 		de = (struct buffered_dirent *)buf.dirent;
+
 		while (size > 0) {
+			struct inode *dir_inode = file->f_path.dentry->d_inode;
+			int finished;
+
 			offset = de->offset;

-			if (func(cdp, de->name, de->namlen, de->offset,
-				 de->ino, de->d_type))
+			/*
+			 * Various filldir functions may end up calling back
+			 * into lookup_one_len() and the file system's 
+			 * ->lookup() method. These expect i_mutex to be held.
+			 */
+			host_err = mutex_lock_killable(&dir_inode->i_mutex);
+			if (host_err)
 				goto done;

+			finished = func(cdp, de->name, de->namlen, de->offset,
+					de->ino, de->d_type);
+
+			mutex_unlock(&dir_inode->i_mutex);
+
 			if (cdp->err != nfs_ok)
 				goto done;

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@...el.com                              Intel Corporation

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/