lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Z4lCmIB_7cPm0Ebv@slm.duckdns.org>
Date: Thu, 16 Jan 2025 07:32:08 -1000
From: Tejun Heo <tj@...nel.org>
To: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc: cgroups@...r.kernel.org, linux-kernel@...r.kernel.org,
	Michal Koutný <mkoutny@...e.com>,
	"Paul E. McKenney" <paulmck@...nel.org>,
	Boqun Feng <boqun.feng@...il.com>,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	Hillf Danton <hdanton@...a.com>,
	Johannes Weiner <hannes@...xchg.org>,
	Marco Elver <elver@...gle.com>, Zefan Li <lizefan.x@...edance.com>,
	tglx@...utronix.de,
	syzbot+6ea37e2e6ffccf41a7e6@...kaller.appspotmail.com
Subject: Re: [PATCH v3] kernfs: Use RCU for kernfs_node::name and ::parent
 lookup.

Hello,

On Thu, Jan 16, 2025 at 02:27:45PM +0100, Sebastian Andrzej Siewior wrote:
> > Shouldn't this be freed somewhere?
> 
> There is
>          char *kn_name __free(kfree) = NULL;
> 
> at the top. This will kfree(kn_name) once it is out of scope (on return
> from rdtgroup_pseudo_lock_create()).

Ah, makes sense.

...
> > > @@ -557,16 +568,18 @@ void kernfs_put(struct kernfs_node *kn)
> > >  	if (!kn || !atomic_dec_and_test(&kn->count))
> > >  		return;
> > >  	root = kernfs_root(kn);
> > > +	guard(rcu)();
> > >   repeat:
> > >  	/*
> > >  	 * Moving/renaming is always done while holding reference.
> > >  	 * kn->parent won't change beneath us.
> > >  	 */
> > > -	parent = kn->parent;
> > > +	parent = rcu_dereference(kn->parent);
> > 
> > I wonder whether it'd be better to encode the reference count rule (ie. add
> > the condition kn->count == 0 to deref_check) in the kn->parent deref
> > accessor. This function doesn't need RCU read lock and holding it makes it
> > more confusing.
> 
> You are saying that we don't need RCU here because if we drop the last
> reference then nobody can rename the node anymore and so parent can't
> change. That sounds right.
> What about using rcu_dereference_protected() instead? Using
> rcu_dereference(x, !atomic_read(&kn->count)) looks odd given that we
> established that the counter is 0. Therefore I would suggest
> rcu_access_pointer() but the reference drop might qualify as "locked".

I think it's usually a better form to encode the whole access rule in a
shared accessor for the field so that the deref rules for the field can be
understood and enforced from the shared accessor and the shared accessor
would use rcu_dereference_protected() internally.

> > > diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
> > > index 8502ef68459b9..05f7b30283150 100644
> > > --- a/fs/kernfs/file.c
> > > +++ b/fs/kernfs/file.c
> > > @@ -911,9 +911,11 @@ static void kernfs_notify_workfn(struct work_struct *work)
> > >  	/* kick fsnotify */
> > >  
> > >  	down_read(&root->kernfs_supers_rwsem);
> > > +	down_read(&root->kernfs_rwsem);
> > 
> > Why is this addition necessary? Hmm... was the code previously broken w.r.t.
> > renaming? Can this be RCU?
> 
> I *think* it was broken unless you unsure somehow that this can't be
> invoked on nodes which can be renamed.
> The ensures that the later obtained kn_name does not freed after a
> rename.
> This can not be RCU because ilookup() has wait_on_inode() (might sleep).

If it was broken, let's separate it out to its own patch.

> > > diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c
> > > index 1358c21837f1a..db71faba3bb53 100644
> > > --- a/fs/kernfs/mount.c
> > > +++ b/fs/kernfs/mount.c
> > > @@ -145,8 +145,10 @@ static struct dentry *kernfs_fh_to_parent(struct super_block *sb,
> > >  static struct dentry *kernfs_get_parent_dentry(struct dentry *child)
> > >  {
> > >  	struct kernfs_node *kn = kernfs_dentry_node(child);
> > > +	struct kernfs_root *root = kernfs_root(kn);
> > >  
> > > -	return d_obtain_alias(kernfs_get_inode(child->d_sb, kn->parent));
> > > +	guard(rwsem_read)(&root->kernfs_rwsem);
> > > +	return d_obtain_alias(kernfs_get_inode(child->d_sb, kernfs_rcu_get_parent(kn)));
> > 
> > Ditto.
> 
> kernfs_rcu_get_parent() gets you name from the kn. Can you ensure that
> it won't go away during a rename? If so, I would add the matching
> comment then.
> There is d_obtain_alias() -> __d_obtain_alias() -> d_alloc_anon() which
> makes not possible to use RCU.

If true, better to put it in its own prep patch, I think.

...
> > > @@ -216,6 +219,9 @@ struct dentry *kernfs_node_dentry(struct kernfs_node *kn,
> > >  	if (!kn->parent)
> > >  		return dentry;
> > >  
> > > +	root = kernfs_root(kn);
> > > +	guard(rwsem_read)(&root->kernfs_rwsem);
> > 
> > Here too, it's a bit confusing that it's adding new locking. Was the code
> > broken before? If so, it'd be clearer if the fixes were in their own patch.
> 
> It dereferences name (later in lookup_positive_unlocked()). I don't see
> how it is safe against a rename without the lock.
> 
> If you agree that all three are bugs, that existed before, then I will
> extract it out of this patch.

Not sure whether they're all correct but let's separate them out.

...
> > I wonder whether it'd be better to rename kn->parent to something like
> > kn->__parent (or maybe some other suffix) to clarify that the field is not
> > to be deref'ed directly and kernfs_parent() helper is made available to the
> > users. That way, users can benefit from the additional conditions in
> > rcu_dereference_check().
> 
> sparse should yell at people if they deference directly. I have no
> problem to rename it to __parent if you say so.

Sparse isn't useless but also often ignored. Probably better to be explicit.

...
> > e.g. Here, it'd be a lot better if kernfs provided helper can be used so
> > that deref condition check can be preserved.
> 
> Something like
> | static struct cgroup *cg_get_parent_priv(struct kernfs_node *kn)
> | {
> |       return rcu_dereference_check(kn->parent, kn->flags & KERNFS_ROOT_INVARIANT_PARENT)->priv;
> | }

Maybe a shorter name?

Thanks.

-- 
tejun

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ