[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CALm+0cX+hZSO8S+kZ1K1E4mhXuWaFvZrXKzrbb7-jt-Q73mCCQ@mail.gmail.com>
Date: Fri, 16 May 2025 16:15:25 +0800
From: Z qiang <qiang.zhang1211@...il.com>
To: Joel Fernandes <joelagnelf@...dia.com>
Cc: paulmck@...nel.org, frederic@...nel.org, neeraj.upadhyay@...nel.org,
joel@...lfernandes.org, urezki@...il.com, boqun.feng@...il.com,
rcu@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] rcu/nocb: Fix possible invalid rdp's->nocb_cb_kthread
pointer access
>
> On 5/9/2025 3:07 PM, Joel Fernandes wrote:
> >
> >
> > On 5/7/2025 7:26 AM, Zqiang wrote:
> >> In the preparation stage of CPU online, if the corresponding
> >> the rdp's->nocb_cb_kthread does not exist, will be created,
> >> there is a situation where the rdp's rcuop kthreads creation fails,
> >> and then de-offload this CPU's rdp, does not assign this CPU's
> >> rdp->nocb_cb_kthread pointer, but this rdp's->nocb_gp_rdp and
> >> rdp's->rdp_gp->nocb_gp_kthread is still valid.
>
> Maybe I mixed up what you're doing but this commit message is a bit hard to
> parse. You're saying in the problematic scenario, "rdp's->nocb_gp_rdp" is valid,
> but in the patch below you're checking for "rdp's->nocb_gp_rdp". So I am a bit
> confused, why you would not run into the same problematic scenario. I think I
> agree with your patch but it would be good to refine and clarify the problematic
> condition, the commit message, and also please add some comments to the code :).
Hello, Joel
Maybe my description is not clear enough.
The scenario I want to describe is:
This is a small probability event,the rdp->nocb_cb_kthread creation
may fail in rcu_spawn_cpu_nocb_kthread(), this rdp->nocb_cb_kthread
will not set,is null pointer. but this rdp->rdp_gp->nocb_gp_kthread
and rdp->nocb_gp_rdp pointer is valid.
but in rcu_nocb_rdp_offload(), we only check rdp->nocb_gp_rdp and
rdp->rdp_gp->nocb_gp_kthread, this will cause kthread_unpark() access
null rdp->nocb_cb_kthread pointer.
so I use rdp->nocb_gp_kthread condition to make a judgment,
if the rdp->nocb_gp_kthread is valid, this means that rdp->nocb_cb_kthread
is also valid and there is no kthreads creation failure in
rcu_spawn_cpu_nocb_kthread()
Thanks
Zqiang
>
> Do you have a reproducer for this? If not, maybe we can work on some test cases
> Both CPU hotplug and offload is tested by rcutorture, so I'd expect we run into
> it. But perhaps not, because it requires kthread creation to fail?
>
> thanks,
>
> - Joel
>
>
> >>
> >> This will cause the subsequent re-offload operation of this offline
> >> CPU, which will pass the conditional check and the kthread_unpark()
> >> will access invalid rdp's->nocb_cb_kthread pointer.
> >>
> >> This commit therefore use rdp's->nocb_gp_kthread instead of
> >> rdp_gp's->nocb_gp_kthread for safety check.
> >>
> >> Signed-off-by: Zqiang <qiang.zhang1211@...il.com>
> >
> > Is it possible that rdp_gp->nocb_gp_kthread is valid, but rdp->nocb_cb_kthread
> > is still invalid (because its creation failed?). This seems a bit fragile to me
> > to assume that if rdp_gp->nocb_gp_kthread then rdp->nocb_cb_kthread is valid. Or
> > is there a path that makes sure this invariant is always satisfied? If so, can
> > we add additional warnings for checking this invariant?
> >
> > Also from the other thread, it sounds like there is more work to do here
> > (related patches so I'd like to defer this to 6.17 - feel free to keep posting
> > patches for this work though).
> >
> > Thanks!
> >
> > - Joel
> >
> >> ---
> >> kernel/rcu/tree_nocb.h | 5 ++---
> >> 1 file changed, 2 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> >> index 1596812f7f12..6679140bb0b5 100644
> >> --- a/kernel/rcu/tree_nocb.h
> >> +++ b/kernel/rcu/tree_nocb.h
> >> @@ -1146,7 +1146,6 @@ static bool rcu_nocb_rdp_offload_wait_cond(struct rcu_data *rdp)
> >> static int rcu_nocb_rdp_offload(struct rcu_data *rdp)
> >> {
> >> int wake_gp;
> >> - struct rcu_data *rdp_gp = rdp->nocb_gp_rdp;
> >>
> >> WARN_ON_ONCE(cpu_online(rdp->cpu));
> >> /*
> >> @@ -1156,7 +1155,7 @@ static int rcu_nocb_rdp_offload(struct rcu_data *rdp)
> >> if (!rdp->nocb_gp_rdp)
> >> return -EINVAL;
> >>
> >> - if (WARN_ON_ONCE(!rdp_gp->nocb_gp_kthread))
> >> + if (WARN_ON_ONCE(!rdp->nocb_gp_kthread))
> >> return -EINVAL;
> >>
> >> pr_info("Offloading %d\n", rdp->cpu);
> >> @@ -1166,7 +1165,7 @@ static int rcu_nocb_rdp_offload(struct rcu_data *rdp)
> >>
> >> wake_gp = rcu_nocb_queue_toggle_rdp(rdp);
> >> if (wake_gp)
> >> - wake_up_process(rdp_gp->nocb_gp_kthread);
> >> + wake_up_process(rdp->nocb_gp_kthread);
> >>
> >> swait_event_exclusive(rdp->nocb_state_wq,
> >> rcu_nocb_rdp_offload_wait_cond(rdp));
> >
>
Powered by blists - more mailing lists