[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Z4lCmIB_7cPm0Ebv@slm.duckdns.org>
Date: Thu, 16 Jan 2025 07:32:08 -1000
From: Tejun Heo <tj@...nel.org>
To: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc: cgroups@...r.kernel.org, linux-kernel@...r.kernel.org,
Michal Koutný <mkoutny@...e.com>,
"Paul E. McKenney" <paulmck@...nel.org>,
Boqun Feng <boqun.feng@...il.com>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Hillf Danton <hdanton@...a.com>,
Johannes Weiner <hannes@...xchg.org>,
Marco Elver <elver@...gle.com>, Zefan Li <lizefan.x@...edance.com>,
tglx@...utronix.de,
syzbot+6ea37e2e6ffccf41a7e6@...kaller.appspotmail.com
Subject: Re: [PATCH v3] kernfs: Use RCU for kernfs_node::name and ::parent
lookup.
Hello,
On Thu, Jan 16, 2025 at 02:27:45PM +0100, Sebastian Andrzej Siewior wrote:
> > Shouldn't this be freed somewhere?
>
> There is
> char *kn_name __free(kfree) = NULL;
>
> at the top. This will kfree(kn_name) once it is out of scope (on return
> from rdtgroup_pseudo_lock_create()).
Ah, makes sense.
...
> > > @@ -557,16 +568,18 @@ void kernfs_put(struct kernfs_node *kn)
> > > if (!kn || !atomic_dec_and_test(&kn->count))
> > > return;
> > > root = kernfs_root(kn);
> > > + guard(rcu)();
> > > repeat:
> > > /*
> > > * Moving/renaming is always done while holding reference.
> > > * kn->parent won't change beneath us.
> > > */
> > > - parent = kn->parent;
> > > + parent = rcu_dereference(kn->parent);
> >
> > I wonder whether it'd be better to encode the reference count rule (ie. add
> > the condition kn->count == 0 to deref_check) in the kn->parent deref
> > accessor. This function doesn't need RCU read lock and holding it makes it
> > more confusing.
>
> You are saying that we don't need RCU here because if we drop the last
> reference then nobody can rename the node anymore and so parent can't
> change. That sounds right.
> What about using rcu_dereference_protected() instead? Using
> rcu_dereference(x, !atomic_read(&kn->count)) looks odd given that we
> established that the counter is 0. Therefore I would suggest
> rcu_access_pointer() but the reference drop might qualify as "locked".
I think it's usually a better form to encode the whole access rule in a
shared accessor for the field so that the deref rules for the field can be
understood and enforced from the shared accessor and the shared accessor
would use rcu_dereference_protected() internally.
> > > diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
> > > index 8502ef68459b9..05f7b30283150 100644
> > > --- a/fs/kernfs/file.c
> > > +++ b/fs/kernfs/file.c
> > > @@ -911,9 +911,11 @@ static void kernfs_notify_workfn(struct work_struct *work)
> > > /* kick fsnotify */
> > >
> > > down_read(&root->kernfs_supers_rwsem);
> > > + down_read(&root->kernfs_rwsem);
> >
> > Why is this addition necessary? Hmm... was the code previously broken w.r.t.
> > renaming? Can this be RCU?
>
> I *think* it was broken unless you unsure somehow that this can't be
> invoked on nodes which can be renamed.
> The ensures that the later obtained kn_name does not freed after a
> rename.
> This can not be RCU because ilookup() has wait_on_inode() (might sleep).
If it was broken, let's separate it out to its own patch.
> > > diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c
> > > index 1358c21837f1a..db71faba3bb53 100644
> > > --- a/fs/kernfs/mount.c
> > > +++ b/fs/kernfs/mount.c
> > > @@ -145,8 +145,10 @@ static struct dentry *kernfs_fh_to_parent(struct super_block *sb,
> > > static struct dentry *kernfs_get_parent_dentry(struct dentry *child)
> > > {
> > > struct kernfs_node *kn = kernfs_dentry_node(child);
> > > + struct kernfs_root *root = kernfs_root(kn);
> > >
> > > - return d_obtain_alias(kernfs_get_inode(child->d_sb, kn->parent));
> > > + guard(rwsem_read)(&root->kernfs_rwsem);
> > > + return d_obtain_alias(kernfs_get_inode(child->d_sb, kernfs_rcu_get_parent(kn)));
> >
> > Ditto.
>
> kernfs_rcu_get_parent() gets you name from the kn. Can you ensure that
> it won't go away during a rename? If so, I would add the matching
> comment then.
> There is d_obtain_alias() -> __d_obtain_alias() -> d_alloc_anon() which
> makes not possible to use RCU.
If true, better to put it in its own prep patch, I think.
...
> > > @@ -216,6 +219,9 @@ struct dentry *kernfs_node_dentry(struct kernfs_node *kn,
> > > if (!kn->parent)
> > > return dentry;
> > >
> > > + root = kernfs_root(kn);
> > > + guard(rwsem_read)(&root->kernfs_rwsem);
> >
> > Here too, it's a bit confusing that it's adding new locking. Was the code
> > broken before? If so, it'd be clearer if the fixes were in their own patch.
>
> It dereferences name (later in lookup_positive_unlocked()). I don't see
> how it is safe against a rename without the lock.
>
> If you agree that all three are bugs, that existed before, then I will
> extract it out of this patch.
Not sure whether they're all correct but let's separate them out.
...
> > I wonder whether it'd be better to rename kn->parent to something like
> > kn->__parent (or maybe some other suffix) to clarify that the field is not
> > to be deref'ed directly and kernfs_parent() helper is made available to the
> > users. That way, users can benefit from the additional conditions in
> > rcu_dereference_check().
>
> sparse should yell at people if they deference directly. I have no
> problem to rename it to __parent if you say so.
Sparse isn't useless but also often ignored. Probably better to be explicit.
...
> > e.g. Here, it'd be a lot better if kernfs provided helper can be used so
> > that deref condition check can be preserved.
>
> Something like
> | static struct cgroup *cg_get_parent_priv(struct kernfs_node *kn)
> | {
> | return rcu_dereference_check(kn->parent, kn->flags & KERNFS_ROOT_INVARIANT_PARENT)->priv;
> | }
Maybe a shorter name?
Thanks.
--
tejun
Powered by blists - more mailing lists