lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20240807-openfast-v3-1-040d132d2559@kernel.org>
Date: Wed, 07 Aug 2024 08:10:27 -0400
From: Jeff Layton <jlayton@...nel.org>
To: Alexander Viro <viro@...iv.linux.org.uk>, 
 Christian Brauner <brauner@...nel.org>, Jan Kara <jack@...e.cz>, 
 Andrew Morton <akpm@...ux-foundation.org>
Cc: Andi Kleen <ak@...ux.intel.com>, Mateusz Guzik <mjguzik@...il.com>, 
 Josef Bacik <josef@...icpanda.com>, linux-fsdevel@...r.kernel.org, 
 linux-kernel@...r.kernel.org, Jeff Layton <jlayton@...nel.org>
Subject: [PATCH v3] fs: try an opportunistic lookup for O_CREAT opens too

Today, when opening a file we'll typically do a fast lookup, but if
O_CREAT is set, the kernel always takes the exclusive inode lock. I
assume this was done with the expectation that O_CREAT means that we
always expect to do the create, but that's often not the case. Many
programs set O_CREAT even in scenarios where the file already exists.

This patch rearranges the pathwalk-for-open code to also attempt a
fast_lookup in certain O_CREAT cases. If a positive dentry is found, the
inode_lock can be avoided altogether, and if auditing isn't enabled, it
can stay in rcuwalk mode for the last step_into.

One notable exception that is hopefully temporary: if we're doing an
rcuwalk and auditing is enabled, skip the lookup_fast. Legitimizing the
dentry in that case is more expensive than taking the i_rwsem for now.

Signed-off-by: Jeff Layton <jlayton@...nel.org>
---
Here's a revised patch that does a fast_lookup in the O_CREAT codepath
too. The main difference here is that if a positive dentry is found and
audit_dummy_context is true, then we keep the walk lazy for the last
component, which avoids having to take any locks on the parent (just
like with non-O_CREAT opens).

Mateusz wrote a will-it-scale test that does an O_CREAT open and close in
the same directory repeatedly. Running that in 70 different processes:

    v6.10:		  754565
    v6.10+patch:	25747851

...which is roughly a 34x speedup. I also ran the unlink1 test in single
process mode to try and gauge how bad the performance impact would be in
the case where we always have to search, not find anything and do the
create:

    v6.10:		200106
    v6.10+patch:	199188

~0.4% performance hit in that test. I'm not sure that's statistically
significant, but we should keep an eye out for slowdowns in these sorts
of workloads if we decide to take this.
---
Changes in v3:
- Check for IS_ERR in lookup_fast result
- Future-proof open_last_lookups to handle case where lookup_fast_for_open
  returns a positive dentry while auditing is enabled
- Link to v2: https://lore.kernel.org/r/20240806-openfast-v2-1-42da45981811@kernel.org

Changes in v2:
- drop the lockref patch since Mateusz is working on a better approach
- add trailing_slashes helper function
- add a lookup_fast_for_open helper function
- make lookup_fast_for_open skip the lookup if auditing is enabled
- if we find a positive dentry and auditing is disabled, don't unlazy
- Link to v1: https://lore.kernel.org/r/20240802-openfast-v1-0-a1cff2a33063@kernel.org
---
 fs/namei.c | 74 +++++++++++++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 64 insertions(+), 10 deletions(-)

diff --git a/fs/namei.c b/fs/namei.c
index 1e05a0f3f04d..7894fafa8e71 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3518,6 +3518,49 @@ static struct dentry *lookup_open(struct nameidata *nd, struct file *file,
 	return ERR_PTR(error);
 }
 
+static inline bool trailing_slashes(struct nameidata *nd)
+{
+	return (bool)nd->last.name[nd->last.len];
+}
+
+static struct dentry *lookup_fast_for_open(struct nameidata *nd, int open_flag)
+{
+	struct dentry *dentry;
+
+	if (open_flag & O_CREAT) {
+		/* Don't bother on an O_EXCL create */
+		if (open_flag & O_EXCL)
+			return NULL;
+
+		/*
+		 * FIXME: If auditing is enabled, then we'll have to unlazy to
+		 * use the dentry. For now, don't do this, since it shifts
+		 * contention from parent's i_rwsem to its d_lockref spinlock.
+		 * Reconsider this once dentry refcounting handles heavy
+		 * contention better.
+		 */
+		if ((nd->flags & LOOKUP_RCU) && !audit_dummy_context())
+			return NULL;
+	}
+
+	if (trailing_slashes(nd))
+		nd->flags |= LOOKUP_FOLLOW | LOOKUP_DIRECTORY;
+
+	dentry = lookup_fast(nd);
+	if (IS_ERR_OR_NULL(dentry))
+		return dentry;
+
+	if (open_flag & O_CREAT) {
+		/* Discard negative dentries. Need inode_lock to do the create */
+		if (!dentry->d_inode) {
+			if (!(nd->flags & LOOKUP_RCU))
+				dput(dentry);
+			dentry = NULL;
+		}
+	}
+	return dentry;
+}
+
 static const char *open_last_lookups(struct nameidata *nd,
 		   struct file *file, const struct open_flags *op)
 {
@@ -3535,28 +3578,39 @@ static const char *open_last_lookups(struct nameidata *nd,
 		return handle_dots(nd, nd->last_type);
 	}
 
+	/* We _can_ be in RCU mode here */
+	dentry = lookup_fast_for_open(nd, open_flag);
+	if (IS_ERR(dentry))
+		return ERR_CAST(dentry);
+
 	if (!(open_flag & O_CREAT)) {
-		if (nd->last.name[nd->last.len])
-			nd->flags |= LOOKUP_FOLLOW | LOOKUP_DIRECTORY;
-		/* we _can_ be in RCU mode here */
-		dentry = lookup_fast(nd);
-		if (IS_ERR(dentry))
-			return ERR_CAST(dentry);
 		if (likely(dentry))
 			goto finish_lookup;
 
 		if (WARN_ON_ONCE(nd->flags & LOOKUP_RCU))
 			return ERR_PTR(-ECHILD);
 	} else {
-		/* create side of things */
 		if (nd->flags & LOOKUP_RCU) {
-			if (!try_to_unlazy(nd))
+			bool unlazied;
+
+			/* can stay in rcuwalk if not auditing */
+			if (dentry && audit_dummy_context()) {
+				if (trailing_slashes(nd))
+					return ERR_PTR(-EISDIR);
+				goto finish_lookup;
+			}
+			unlazied = dentry ? try_to_unlazy_next(nd, dentry) :
+					    try_to_unlazy(nd);
+			if (!unlazied)
 				return ERR_PTR(-ECHILD);
 		}
 		audit_inode(nd->name, dir, AUDIT_INODE_PARENT);
-		/* trailing slashes? */
-		if (unlikely(nd->last.name[nd->last.len]))
+		if (trailing_slashes(nd)) {
+			dput(dentry);
 			return ERR_PTR(-EISDIR);
+		}
+		if (dentry)
+			goto finish_lookup;
 	}
 
 	if (open_flag & (O_CREAT | O_TRUNC | O_WRONLY | O_RDWR)) {

---
base-commit: 0c3836482481200ead7b416ca80c68a29cfdaabd
change-id: 20240723-openfast-ac49a7b6ade2

Best regards,
-- 
Jeff Layton <jlayton@...nel.org>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ