lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250116132745.dU941oor@linutronix.de>
Date: Thu, 16 Jan 2025 14:27:45 +0100
From: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To: Tejun Heo <tj@...nel.org>
Cc: cgroups@...r.kernel.org, linux-kernel@...r.kernel.org,
	Michal Koutný <mkoutny@...e.com>,
	"Paul E. McKenney" <paulmck@...nel.org>,
	Boqun Feng <boqun.feng@...il.com>,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	Hillf Danton <hdanton@...a.com>,
	Johannes Weiner <hannes@...xchg.org>,
	Marco Elver <elver@...gle.com>, Zefan Li <lizefan.x@...edance.com>,
	tglx@...utronix.de,
	syzbot+6ea37e2e6ffccf41a7e6@...kaller.appspotmail.com
Subject: Re: [PATCH v3] kernfs: Use RCU for kernfs_node::name and ::parent
 lookup.

On 2024-12-03 12:21:55 [-1000], Tejun Heo wrote:
> Hello, sorry about the delay.
Hi,

> Generally looks good to me but I have some rcu deref accessor related
> comments.

no worries, I also fell behind.

> On Thu, Nov 21, 2024 at 06:52:50PM +0100, Sebastian Andrzej Siewior wrote:
> > diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> ...
> > @@ -1312,6 +1314,11 @@ int rdtgroup_pseudo_lock_create(struct rdtgroup *rdtgrp)
> >  		ret = -EINVAL;
> >  		goto out_region;
> >  	}
> > +	kn_name = kstrdup(rdt_kn_get_name(rdtgrp->kn), GFP_KERNEL);
> 
> Shouldn't this be freed somewhere?

There is
         char *kn_name __free(kfree) = NULL;

at the top. This will kfree(kn_name) once it is out of scope (on return
from rdtgroup_pseudo_lock_create()).

> > +	if (!kn_name) {
> > +		ret = -ENOMEM;
> > +		goto out_cstates;
> > +	}
> >  
> >  	plr->thread_done = 0;
> >  
> ...
> > diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
> ...
> > @@ -533,7 +543,8 @@ static void kernfs_free_rcu(struct rcu_head *rcu)
> >  {
> >  	struct kernfs_node *kn = container_of(rcu, struct kernfs_node, rcu);
> >  
> > -	kfree_const(kn->name);
> > +	/* If the whole node goes away, the name can't be used outside */
> > +	kfree_const(rcu_dereference_check(kn->name, true));
> 
> rcu_access_pointer()?
oh, thanks.

> > @@ -557,16 +568,18 @@ void kernfs_put(struct kernfs_node *kn)
> >  	if (!kn || !atomic_dec_and_test(&kn->count))
> >  		return;
> >  	root = kernfs_root(kn);
> > +	guard(rcu)();
> >   repeat:
> >  	/*
> >  	 * Moving/renaming is always done while holding reference.
> >  	 * kn->parent won't change beneath us.
> >  	 */
> > -	parent = kn->parent;
> > +	parent = rcu_dereference(kn->parent);
> 
> I wonder whether it'd be better to encode the reference count rule (ie. add
> the condition kn->count == 0 to deref_check) in the kn->parent deref
> accessor. This function doesn't need RCU read lock and holding it makes it
> more confusing.

You are saying that we don't need RCU here because if we drop the last
reference then nobody can rename the node anymore and so parent can't
change. That sounds right.
What about using rcu_dereference_protected() instead? Using
rcu_dereference(x, !atomic_read(&kn->count)) looks odd given that we
established that the counter is 0. Therefore I would suggest
rcu_access_pointer() but the reference drop might qualify as "locked".

> > diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
> > index 8502ef68459b9..05f7b30283150 100644
> > --- a/fs/kernfs/file.c
> > +++ b/fs/kernfs/file.c
> > @@ -911,9 +911,11 @@ static void kernfs_notify_workfn(struct work_struct *work)
> >  	/* kick fsnotify */
> >  
> >  	down_read(&root->kernfs_supers_rwsem);
> > +	down_read(&root->kernfs_rwsem);
> 
> Why is this addition necessary? Hmm... was the code previously broken w.r.t.
> renaming? Can this be RCU?

I *think* it was broken unless you unsure somehow that this can't be
invoked on nodes which can be renamed.
The ensures that the later obtained kn_name does not freed after a
rename.
This can not be RCU because ilookup() has wait_on_inode() (might sleep).

> > diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c
> > index 1358c21837f1a..db71faba3bb53 100644
> > --- a/fs/kernfs/mount.c
> > +++ b/fs/kernfs/mount.c
> > @@ -145,8 +145,10 @@ static struct dentry *kernfs_fh_to_parent(struct super_block *sb,
> >  static struct dentry *kernfs_get_parent_dentry(struct dentry *child)
> >  {
> >  	struct kernfs_node *kn = kernfs_dentry_node(child);
> > +	struct kernfs_root *root = kernfs_root(kn);
> >  
> > -	return d_obtain_alias(kernfs_get_inode(child->d_sb, kn->parent));
> > +	guard(rwsem_read)(&root->kernfs_rwsem);
> > +	return d_obtain_alias(kernfs_get_inode(child->d_sb, kernfs_rcu_get_parent(kn)));
> 
> Ditto.

kernfs_rcu_get_parent() gets you name from the kn. Can you ensure that
it won't go away during a rename? If so, I would add the matching
comment then.
There is d_obtain_alias() -> __d_obtain_alias() -> d_alloc_anon() which
makes not possible to use RCU.

> > @@ -186,10 +188,10 @@ static struct kernfs_node *find_next_ancestor(struct kernfs_node *child,
> >  		return NULL;
> >  	}
> >  
> > -	while (child->parent != parent) {
> > -		if (!child->parent)
> > +	while (kernfs_rcu_get_parent(child) != parent) {
> > +		child = kernfs_rcu_get_parent(child);
> > +		if (!child)
> 
> I think kernfs_rcu_get_parent() name is a bit confusing given that it allows
> derefing without RCU if the rwsem is locked. Maybe just kernfs_get_parent()
> or kernfs_parent()?

kernfs_get_parent() exists, switching to kernfs_parent() then.

> > @@ -216,6 +219,9 @@ struct dentry *kernfs_node_dentry(struct kernfs_node *kn,
> >  	if (!kn->parent)
> >  		return dentry;
> >  
> > +	root = kernfs_root(kn);
> > +	guard(rwsem_read)(&root->kernfs_rwsem);
> 
> Here too, it's a bit confusing that it's adding new locking. Was the code
> broken before? If so, it'd be clearer if the fixes were in their own patch.

It dereferences name (later in lookup_positive_unlocked()). I don't see
how it is safe against a rename without the lock.

If you agree that all three are bugs, that existed before, then I will
extract it out of this patch.

> > diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
> > index d1995e2d6c943..e9bfe3e80809d 100644
> > --- a/fs/sysfs/file.c
> > +++ b/fs/sysfs/file.c
> > @@ -19,13 +19,19 @@
> >  
> >  #include "sysfs.h"
> >  
> > +static struct kobject *sysfs_file_kobj(struct kernfs_node *kn)
> > +{
> > +	guard(rcu)();
> > +	return rcu_dereference(kn->parent)->priv;
> > +}
> 
> I wonder whether it'd be better to rename kn->parent to something like
> kn->__parent (or maybe some other suffix) to clarify that the field is not
> to be deref'ed directly and kernfs_parent() helper is made available to the
> users. That way, users can benefit from the additional conditions in
> rcu_dereference_check().

sparse should yell at people if they deference directly. I have no
problem to rename it to __parent if you say so.

> > diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c
> > index e28d5f0d20ed0..202e329759b12 100644
> > --- a/kernel/cgroup/cgroup-v1.c
> > +++ b/kernel/cgroup/cgroup-v1.c
> > @@ -844,7 +844,7 @@ static int cgroup1_rename(struct kernfs_node *kn, struct kernfs_node *new_parent
> >  
> >  	if (kernfs_type(kn) != KERNFS_DIR)
> >  		return -ENOTDIR;
> > -	if (kn->parent != new_parent)
> > +	if (rcu_dereference_check(kn->parent, true) != new_parent)
> >  		return -EIO;
> 
> This isn't being derefed, rcu_access_pointer()?

Ok.

> >  	/*
> > diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
> > index 044c7ba1cc482..d11d05a53783c 100644
> > --- a/kernel/cgroup/cgroup.c
> > +++ b/kernel/cgroup/cgroup.c
> > @@ -633,9 +633,15 @@ int cgroup_task_count(const struct cgroup *cgrp)
> >  	return count;
> >  }
> >  
> > +static struct cgroup *cg_get_parent_priv(struct kernfs_node *kn)
> > +{
> > +	/* The parent can not be changed */
> > +	return rcu_dereference_check(kn->parent, true)->priv;
> > +}
> 
> e.g. Here, it'd be a lot better if kernfs provided helper can be used so
> that deref condition check can be preserved.

Something like
| static struct cgroup *cg_get_parent_priv(struct kernfs_node *kn)
| {
|       return rcu_dereference_check(kn->parent, kn->flags & KERNFS_ROOT_INVARIANT_PARENT)->priv;
| }

then?

> Thanks.

Sebastian

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ