lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20251121072237.3230021-1-mjguzik@gmail.com>
Date: Fri, 21 Nov 2025 08:22:37 +0100
From: Mateusz Guzik <mjguzik@...il.com>
To: brauner@...nel.org
Cc: viro@...iv.linux.org.uk,
	jack@...e.cz,
	linux-kernel@...r.kernel.org,
	linux-fsdevel@...r.kernel.org,
	Mateusz Guzik <mjguzik@...il.com>
Subject: [PATCH] fs: scale opening of character devices

chrdev_open() always takes cdev_lock, which is only needed to synchronize
against cd_forget(). But the latter is only ever called by inode evict(),
meaning these two can never legally race. Solidify this with asserts.

More cleanups are needed here but this is enough to get the thing out of
the way.

Rationale is funny-sounding at first: opening of /dev/zero happens to be
a contention point in large-scale package building (think 100+ packages
at the same with a thread count to support it). Such a workload is not
only very fork+exec heavy, but frequently involves scripts which use the
idiom of silencing output by redirecting it to /dev/null.

A non-large-scale microbenchmark of opening /dev/null in a loop in 16
processes:
before:	2865472
after:	4011960 (+40%)

Code goes from being bottlenecked on the spinlock to being bottlenecked
on lockref.

Signed-off-by: Mateusz Guzik <mjguzik@...il.com>
---

I'll note for interested my experience with the workload at hand comes
from FreeBSD and was surprised to find /dev/null on the profile. Given
that Linux is globally serializing on it, it has to be a factor as well
in this case.

 fs/char_dev.c        | 20 +++++++++++---------
 fs/inode.c           |  2 +-
 include/linux/cdev.h |  2 +-
 3 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/fs/char_dev.c b/fs/char_dev.c
index c2ddb998f3c9..dfde57cb5eed 100644
--- a/fs/char_dev.c
+++ b/fs/char_dev.c
@@ -374,15 +374,15 @@ static int chrdev_open(struct inode *inode, struct file *filp)
 {
 	const struct file_operations *fops;
 	struct cdev *p;
-	struct cdev *new = NULL;
 	int ret = 0;
 
-	spin_lock(&cdev_lock);
-	p = inode->i_cdev;
+	VFS_BUG_ON_INODE(icount_read(inode) < 1, inode);
+
+	p = READ_ONCE(inode->i_cdev);
 	if (!p) {
 		struct kobject *kobj;
+		struct cdev *new;
 		int idx;
-		spin_unlock(&cdev_lock);
 		kobj = kobj_lookup(cdev_map, inode->i_rdev, &idx);
 		if (!kobj)
 			return -ENXIO;
@@ -392,19 +392,19 @@ static int chrdev_open(struct inode *inode, struct file *filp)
 		   we dropped the lock. */
 		p = inode->i_cdev;
 		if (!p) {
-			inode->i_cdev = p = new;
+			p = new;
+			WRITE_ONCE(inode->i_cdev, p);
 			list_add(&inode->i_devices, &p->list);
 			new = NULL;
 		} else if (!cdev_get(p))
 			ret = -ENXIO;
+		spin_unlock(&cdev_lock);
+		cdev_put(new);
 	} else if (!cdev_get(p))
 		ret = -ENXIO;
-	spin_unlock(&cdev_lock);
-	cdev_put(new);
 	if (ret)
 		return ret;
 
-	ret = -ENXIO;
 	fops = fops_get(p->ops);
 	if (!fops)
 		goto out_cdev_put;
@@ -423,8 +423,10 @@ static int chrdev_open(struct inode *inode, struct file *filp)
 	return ret;
 }
 
-void cd_forget(struct inode *inode)
+void inode_cdev_forget(struct inode *inode)
 {
+	VFS_BUG_ON_INODE(!(inode_state_read_once(inode) & I_FREEING), inode);
+
 	spin_lock(&cdev_lock);
 	list_del_init(&inode->i_devices);
 	inode->i_cdev = NULL;
diff --git a/fs/inode.c b/fs/inode.c
index a62032864ddf..88be1f20782d 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -840,7 +840,7 @@ static void evict(struct inode *inode)
 		clear_inode(inode);
 	}
 	if (S_ISCHR(inode->i_mode) && inode->i_cdev)
-		cd_forget(inode);
+		inode_cdev_forget(inode);
 
 	remove_inode_hash(inode);
 
diff --git a/include/linux/cdev.h b/include/linux/cdev.h
index 0e8cd6293deb..bed99967ad90 100644
--- a/include/linux/cdev.h
+++ b/include/linux/cdev.h
@@ -34,6 +34,6 @@ void cdev_device_del(struct cdev *cdev, struct device *dev);
 
 void cdev_del(struct cdev *);
 
-void cd_forget(struct inode *);
+void inode_cdev_forget(struct inode *);
 
 #endif
-- 
2.48.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ