lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140430154958.GC3113@tucsk.piliscsaba.szeredi.hu>
Date:	Wed, 30 Apr 2014 17:49:58 +0200
From:	Miklos Szeredi <miklos@...redi.hu>
To:	Al Viro <viro@...IV.linux.org.uk>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Dave Chinner <david@...morbit.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	linux-fsdevel <linux-fsdevel@...r.kernel.org>
Subject: Re: dcache shrink list corruption?

On Wed, Apr 30, 2014 at 05:04:36AM +0100, Al Viro wrote:
> On Tue, Apr 29, 2014 at 07:56:13PM -0700, Linus Torvalds wrote:
> > On Tue, Apr 29, 2014 at 7:31 PM, Al Viro <viro@...iv.linux.org.uk> wrote:
> > >
> > > OK, aggregate diff follows, more readable splitup (3 commits) attached.
> > > It seems to survive beating here; testing, review and comments are
> > > welcome.
> > 
> > Miklos, did you have some particular load that triggered this, or was
> > it just some reports? It would be really good to get this patch some
> > stress-testing.
> > 
> > I like how the patch removes more lines than it adds, but apart from
> > that it's hard to read the patch (even the split-out ones) and say
> > anything more about it. I think this needs a *lot* of testing.
> 
> FWIW, the first two are really straightforward expanding the function
> into its only callsite.  The last needs more splitup.  Not sure if the
> following is good enough, but it ought to be at least somewhat cleaner.
> Combined change is identical to the original, so it doesn't invalidate
> the testing so far...

Hmm, patches look okay, but I'm wondering if we really need the morgue list and
the waiting.  Why not just skip dentries that are presently being handled by
dentry_kill()?

Thanks,
Miklos
---


From: Al Viro <viro@...iv.linux.org.uk>
Date: Tue, 29 Apr 2014 23:52:05 -0400
Subject: Don't try to remove from shrink list we don't own

So far it's just been local equivalent transformations.  Now the meat of
that thing: dentry_kill() has two kinds of call sites - ones that had just
dropped the last reference (dput(), killing ancestors in
shrink_dentry_list()) and one where the victim had sat on shrink list with
zero refcount.  We already have a flag telling which kind it is
(unlock_on_failure).  What we want to do is to avoid d_shrink_del() in the
first kind of dentry_kill().  We do, however, want everything except the
actual freeing still done as we would in mainline.  So we need to deal with
two new situations - the first kind of dentry_kill() finding a dentry on
shrink list (result of laziness in dget(); we had it on shrink list with
refcount 1) and the second kind finding a dentry that is currently being
killed by the first kind.  What we do is this:

 - in the first kind of dentry_kill() we don't remove the dentry from the
   shrink list, this makes the shrink list really private to the shrinker
   functions

 - we proceed with the normal killing but only actually free the dentry if
   it's definitely not on the shrink list at this point.  If it's still on
   the shrink list set DCACHE_MAY_FREE and defer the freeing to
   shrink_dentry_list()

 - shrink_dentry_list() will detect if the dentry is killed with
   DCACHE_DENTRY_KILLED.  If DCACHE_MAY_FREE isn't yet set, just take the
   dentry off the shrink list and let the still running dentry_kill()
   finish it off.  If DCACHE_MAY_FREE is set, then free the dentry.

Signed-off-by: Al Viro <viro@...iv.linux.org.uk>
Signed-off-by: Miklos Szeredi <mszeredi@...e.cz>
---
 fs/dcache.c            |   44 ++++++++++++++++++++++++++++++++++++++++++--
 include/linux/dcache.h |    2 ++
 2 files changed, 44 insertions(+), 2 deletions(-)

--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -469,6 +469,7 @@ dentry_kill(struct dentry *dentry, int u
 {
 	struct inode *inode;
 	struct dentry *parent;
+	bool can_free = true;
 
 	inode = dentry->d_inode;
 	if (inode && !spin_trylock(&inode->i_lock)) {
@@ -504,8 +505,10 @@ dentry_kill(struct dentry *dentry, int u
 	if (dentry->d_flags & DCACHE_LRU_LIST) {
 		if (!(dentry->d_flags & DCACHE_SHRINK_LIST))
 			d_lru_del(dentry);
-		else
+		else if (!unlock_on_failure)
 			d_shrink_del(dentry);
+		else
+			can_free = false;
 	}
 	/* if it was on the hash then remove it */
 	__d_drop(dentry);
@@ -527,7 +530,25 @@ dentry_kill(struct dentry *dentry, int u
 	if (dentry->d_op && dentry->d_op->d_release)
 		dentry->d_op->d_release(dentry);
 
-	dentry_free(dentry);
+	if (unlikely(!can_free)) {
+		spin_lock(&dentry->d_lock);
+		/*
+		 * Could have gotten off the shrink list while not holding the
+		 * dcache lock.
+		 *
+		 * If still on the shrink list shrink_dentry_list will take care
+		 * of freeing it for us.  Signal this by setting the
+		 * DCACHE_MAY_FREE flag.
+		 */
+		if (!(dentry->d_flags & DCACHE_SHRINK_LIST))
+			can_free = true;
+		else
+			dentry->d_flags |= DCACHE_MAY_FREE;
+		spin_unlock(&dentry->d_lock);
+	}
+	if (likely(can_free))
+		dentry_free(dentry);
+
 	return parent;
 }
 
@@ -835,6 +856,25 @@ static void shrink_dentry_list(struct li
 		}
 		rcu_read_unlock();
 
+		if (unlikely(dentry->d_flags & DCACHE_DENTRY_KILLED)) {
+			bool free_it;
+			/*
+			 * This dentry has already been killed, but was stuck on
+			 * the shrink list.  It may be undergoing cleanup, in
+			 * which case let dput() finish it off.
+			 *
+			 * If it's already clean, dput() deferred freeing to us.
+			 */
+			free_it = (dentry->d_flags & DCACHE_MAY_FREE) != 0;
+			spin_unlock(&dentry->d_lock);
+
+			if (free_it)
+				dentry_free(dentry);
+
+			rcu_read_lock();
+			continue;
+		}
+
 		parent = dentry_kill(dentry, 0);
 		/*
 		 * If dentry_kill returns NULL, we have nothing more to do.
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -221,6 +221,8 @@ struct dentry_operations {
 #define DCACHE_SYMLINK_TYPE		0x00300000 /* Symlink */
 #define DCACHE_FILE_TYPE		0x00400000 /* Other file type */
 
+#define DCACHE_MAY_FREE			0x00800000
+
 extern seqlock_t rename_lock;
 
 static inline int dname_external(const struct dentry *dentry)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ