lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 11 Sep 2021 10:13:42 +0800
From:   Hou Tao <houtao1@...wei.com>
To:     Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Tejun Heo <tj@...nel.org>, Ian Kent <raven@...maw.net>,
        Miklos Szeredi <mszeredi@...hat.com>
CC:     <viro@...IV.linux.org.uk>, <linux-fsdevel@...r.kernel.org>,
        <linux-kernel@...r.kernel.org>, <houtao1@...wei.com>
Subject: [PATCH] kernfs: fix the race in the creation of negative dentry

When doing stress test for module insertion and removal,
the following phenomenon was found:

  $ lsmod
  Module                  Size  Used by
  libkmod: kmod_module_get_holders: could not open \
           '/sys/module/nbd/holders': No such file or directory
  nbd                       -2  -2
  $ cat /proc/modules
  nbd 110592 0 - Live 0xffffffffc0298000
  $ ls -1 /sys/module |grep nbd
  ls: cannot access 'nbd': No such file or directory
  nbd

It seems the kernfs node of module has been activated and is returned to
ls command through kernfs_fop_readdir(), but the sysfs dentry is negative.
Further investigation found that there is race between kernfs dir creation
and dentry lookup as shown below:

CPU 0                          CPU 1

                        kernfs_add_one

                        down_write(&kernfs_rwsem)
                        // insert nbd into rbtree
                        // update the parent's revision
                        kernfs_link_sibling()
                        up_write(&kernfs_rwsem)

kernfs_iop_lookup

down_read(&kernfs_rwsem)
// find nbd in rbtree, but it is deactivated
kn = kernfs_find_ns()
  // return false
  kernfs_active()
  // a negative is created
  d_splice_alias(NULL, dentry)
up_read(&kernfs_rwsem)

                        // activate after negative dentry is created
                        kernfs_activate()

// return 0 because parent's
// revision is stable now
kernfs_dop_revalidate()

The race will create a negative dentry for a kernfs node which
is newly-added and activated. To fix it, there are two cases
to be handled:

(1) kernfs root without KERNFS_ROOT_CREATE_DEACTIVATED
kernfs_rwsem can be always hold during kernfs_link_sibling()
and kernfs_activate() in kernfs_add_one(), so kernfs_iop_lookup()
will find an active kernfs node.

(2) kernfs root with KERNFS_ROOT_CREATE_DEACTIVATED
kernfs_activate() is called separatedly, and we can invalidate
the dentry subtree with kn as root by increasing the revision of
its parent. But we can invalidate in a finer granularity by
only invalidating the negative dentry of the newly-activated
kn node.

So factor out a helper kernfs_activate_locked() to activate
kernfs subtree lockless and invalidate the negative dentries
if requested. Creation under kernfs root with CREATED_DEACTIVATED
doesn't need invalidation because kernfs_rwsem is always hold,
and kernfs root w/o CREATED_DEACTIVATED needs to invalidate
the maybe-created negative dentries.

kernfs_inc_rev() in kernfs_link_sibling() is kept because
kernfs_rename_ns() needs it to invalidate the negative dentry
of the target kernfs which is newly created by rename.

Fixes: c7e7c04274b1 ("kernfs: use VFS negative dentry caching")
Signed-off-by: Hou Tao <houtao1@...wei.com>
---
 fs/kernfs/dir.c | 52 +++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 42 insertions(+), 10 deletions(-)

diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
index ba581429bf7b..2f1ab8bad575 100644
--- a/fs/kernfs/dir.c
+++ b/fs/kernfs/dir.c
@@ -17,6 +17,8 @@
 
 #include "kernfs-internal.h"
 
+static void kernfs_activate_locked(struct kernfs_node *kn, bool invalidate);
+
 DECLARE_RWSEM(kernfs_rwsem);
 static DEFINE_SPINLOCK(kernfs_rename_lock);	/* kn->parent and ->name */
 static char kernfs_pr_cont_buf[PATH_MAX];	/* protected by rename_lock */
@@ -753,8 +755,6 @@ int kernfs_add_one(struct kernfs_node *kn)
 		ps_iattr->ia_mtime = ps_iattr->ia_ctime;
 	}
 
-	up_write(&kernfs_rwsem);
-
 	/*
 	 * Activate the new node unless CREATE_DEACTIVATED is requested.
 	 * If not activated here, the kernfs user is responsible for
@@ -763,8 +763,7 @@ int kernfs_add_one(struct kernfs_node *kn)
 	 * trigger deactivation.
 	 */
 	if (!(kernfs_root(kn)->flags & KERNFS_ROOT_CREATE_DEACTIVATED))
-		kernfs_activate(kn);
-	return 0;
+		kernfs_activate_locked(kn, false);
 
 out_unlock:
 	up_write(&kernfs_rwsem);
@@ -942,8 +941,11 @@ struct kernfs_root *kernfs_create_root(struct kernfs_syscall_ops *scops,
 	root->kn = kn;
 	init_waitqueue_head(&root->deactivate_waitq);
 
-	if (!(root->flags & KERNFS_ROOT_CREATE_DEACTIVATED))
-		kernfs_activate(kn);
+	if (!(root->flags & KERNFS_ROOT_CREATE_DEACTIVATED)) {
+		down_write(&kernfs_rwsem);
+		kernfs_activate_locked(kn, false);
+		up_write(&kernfs_rwsem);
+	}
 
 	return root;
 }
@@ -1262,8 +1264,11 @@ static struct kernfs_node *kernfs_next_descendant_post(struct kernfs_node *pos,
 }
 
 /**
- * kernfs_activate - activate a node which started deactivated
+ * kernfs_activate_locked - activate a node which started deactivated
  * @kn: kernfs_node whose subtree is to be activated
+ * @invalidate: whether or not to increase the revision of parent node
+ *              for each newly-activated child node. The increase will
+ *              invalidate negative dentries created under the parent node.
  *
  * If the root has KERNFS_ROOT_CREATE_DEACTIVATED set, a newly created node
  * needs to be explicitly activated.  A node which hasn't been activated
@@ -1271,15 +1276,15 @@ static struct kernfs_node *kernfs_next_descendant_post(struct kernfs_node *pos,
  * removal.  This is useful to construct atomic init sequences where
  * creation of multiple nodes should either succeed or fail atomically.
  *
+ * The caller must have acquired kernfs_rwsem.
+ *
  * The caller is responsible for ensuring that this function is not called
  * after kernfs_remove*() is invoked on @kn.
  */
-void kernfs_activate(struct kernfs_node *kn)
+static void kernfs_activate_locked(struct kernfs_node *kn, bool invalidate)
 {
 	struct kernfs_node *pos;
 
-	down_write(&kernfs_rwsem);
-
 	pos = NULL;
 	while ((pos = kernfs_next_descendant_post(pos, kn))) {
 		if (pos->flags & KERNFS_ACTIVATED)
@@ -1290,8 +1295,35 @@ void kernfs_activate(struct kernfs_node *kn)
 
 		atomic_sub(KN_DEACTIVATED_BIAS, &pos->active);
 		pos->flags |= KERNFS_ACTIVATED;
+
+		/*
+		 * Invalidate the negative dentry created after pos is
+		 * inserted into sibling rbtree but before it is
+		 * activated.
+		 */
+		if (invalidate && pos->parent)
+			kernfs_inc_rev(pos->parent);
 	}
+}
 
+/**
+ * kernfs_activate - activate a node which started deactivated
+ * @kn: kernfs_node whose subtree is to be activated
+ *
+ * Currently it is only used by kernfs root which has
+ * FS_ROOT_CREATE_DEACTIVATED set. Because the addition and the activation
+ * of children nodes are not atomic (not always hold kernfs_rwsem),
+ * negative dentry may be created for one child node after its addition
+ * but before its activation, so passing invalidate as true to
+ * @kernfs_activate_locked() to invalidate these negative dentries.
+ *
+ * The caller is responsible for ensuring that this function is not called
+ * after kernfs_remove*() is invoked on @kn.
+ */
+void kernfs_activate(struct kernfs_node *kn)
+{
+	down_write(&kernfs_rwsem);
+	kernfs_activate_locked(kn, true);
 	up_write(&kernfs_rwsem);
 }
 
-- 
2.29.2

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ