[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-id: <175002843966.608730.14640390628578526912@noble.neil.brown.name>
Date: Mon, 16 Jun 2025 09:00:39 +1000
From: "NeilBrown" <neil@...wn.name>
To: Al Viro <viro@...iv.linux.org.uk>, Kees Cook <kees@...nel.org>,
Joel Granados <joel.granados@...nel.org>
Cc: linux-fsdevel@...r.kernel.org, LKML <linux-kernel@...r.kernel.org>
Subject:
[PATCH v3?] proc_sysctl: remove rcu_dereference() for accessing ->sysctl
The rcu_dereference() call in proc_sys_compare() is problematic as
->d_compare is not guaranteed to be called with rcu_read_lock() held and
rcu_dereference() can cause a warning when used without that lock.
Specifically d_alloc_parallel() will call ->d_compare() without
rcu_read_lock(), but with ->d_lock to ensure stability. In this case
->d_inode is usually NULL so the rcu_dereference() will normally not be
reached, but it is possible that ->d_inode was set while waiting for
->d_lock which could lead to the warning.
The rcu_dereference() isn't really needed - ->sysctl isn't used in a
pattern which normally requires RCU protection. In particular it is
never updated. It is assigned a value in proc_sys_make_inode() before
the inode is generally visible, and the value is freed (using
rcu_free()) only after any possible access to the inode must have
completed.
Even though the value stored at ->sysctl is not freed, the ->sysctl
pointer itself is set to NULL in proc_evict_inode(). This necessitates
proc_sys_compare() taking care, reading the pointer with READ_ONCE()
(currently via rcu_dereference()) and checking for NULL. If we drop the
assignment to NULL, this care becomes unnecessary.
This patch removes the assignment of NULL in proc_evict_inode() so that
for the entire (public) life of a proc_sysctl inode, the ->sysctl
pointer is stable and points to a valid value. It then changes
proc_sys_compare() to simply use ->sysctl without any concern for it
changing or being NULL.
Signed-off-by: NeilBrown <neil@...wn.name>
---
fs/proc/inode.c | 4 +---
fs/proc/proc_sysctl.c | 4 +---
2 files changed, 2 insertions(+), 6 deletions(-)
diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index a3eb3b740f76..e0f984c44523 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -41,10 +41,8 @@ static void proc_evict_inode(struct inode *inode)
proc_pid_evict_inode(ei);
head = ei->sysctl;
- if (head) {
- RCU_INIT_POINTER(ei->sysctl, NULL);
+ if (head)
proc_sys_evict_inode(inode, head);
- }
}
static struct kmem_cache *proc_inode_cachep __ro_after_init;
diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index cc9d74a06ff0..5358327ee640 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -915,7 +915,6 @@ static int sysctl_is_seen(struct ctl_table_header *p)
static int proc_sys_compare(const struct dentry *dentry,
unsigned int len, const char *str, const struct qstr *name)
{
- struct ctl_table_header *head;
struct inode *inode;
/* Although proc doesn't have negative dentries, rcu-walk means
@@ -928,8 +927,7 @@ static int proc_sys_compare(const struct dentry *dentry,
return 1;
if (memcmp(name->name, str, len))
return 1;
- head = rcu_dereference(PROC_I(inode)->sysctl);
- return !head || !sysctl_is_seen(head);
+ return !sysctl_is_seen(PROC_I(inode)->sysctl);
}
static const struct dentry_operations proc_sys_dentry_operations = {
--
2.49.0
Powered by blists - more mailing lists