[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7581b542-e591-422b-acc3-0c154161ab34@nvidia.com>
Date: Fri, 9 May 2025 15:11:50 -0400
From: Joel Fernandes <joelagnelf@...dia.com>
To: Zqiang <qiang.zhang1211@...il.com>, paulmck@...nel.org,
frederic@...nel.org, neeraj.upadhyay@...nel.org, joel@...lfernandes.org,
urezki@...il.com, boqun.feng@...il.com
Cc: rcu@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] rcu/nocb: Fix possible invalid rdp's->nocb_cb_kthread
pointer access
On 5/9/2025 3:07 PM, Joel Fernandes wrote:
>
>
> On 5/7/2025 7:26 AM, Zqiang wrote:
>> In the preparation stage of CPU online, if the corresponding
>> the rdp's->nocb_cb_kthread does not exist, will be created,
>> there is a situation where the rdp's rcuop kthreads creation fails,
>> and then de-offload this CPU's rdp, does not assign this CPU's
>> rdp->nocb_cb_kthread pointer, but this rdp's->nocb_gp_rdp and
>> rdp's->rdp_gp->nocb_gp_kthread is still valid.
Maybe I mixed up what you're doing but this commit message is a bit hard to
parse. You're saying in the problematic scenario, "rdp's->nocb_gp_rdp" is valid,
but in the patch below you're checking for "rdp's->nocb_gp_rdp". So I am a bit
confused, why you would not run into the same problematic scenario. I think I
agree with your patch but it would be good to refine and clarify the problematic
condition, the commit message, and also please add some comments to the code :).
Do you have a reproducer for this? If not, maybe we can work on some test cases
Both CPU hotplug and offload is tested by rcutorture, so I'd expect we run into
it. But perhaps not, because it requires kthread creation to fail?
thanks,
- Joel
>>
>> This will cause the subsequent re-offload operation of this offline
>> CPU, which will pass the conditional check and the kthread_unpark()
>> will access invalid rdp's->nocb_cb_kthread pointer.
>>
>> This commit therefore use rdp's->nocb_gp_kthread instead of
>> rdp_gp's->nocb_gp_kthread for safety check.
>>
>> Signed-off-by: Zqiang <qiang.zhang1211@...il.com>
>
> Is it possible that rdp_gp->nocb_gp_kthread is valid, but rdp->nocb_cb_kthread
> is still invalid (because its creation failed?). This seems a bit fragile to me
> to assume that if rdp_gp->nocb_gp_kthread then rdp->nocb_cb_kthread is valid. Or
> is there a path that makes sure this invariant is always satisfied? If so, can
> we add additional warnings for checking this invariant?
>
> Also from the other thread, it sounds like there is more work to do here
> (related patches so I'd like to defer this to 6.17 - feel free to keep posting
> patches for this work though).
>
> Thanks!
>
> - Joel
>
>> ---
>> kernel/rcu/tree_nocb.h | 5 ++---
>> 1 file changed, 2 insertions(+), 3 deletions(-)
>>
>> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
>> index 1596812f7f12..6679140bb0b5 100644
>> --- a/kernel/rcu/tree_nocb.h
>> +++ b/kernel/rcu/tree_nocb.h
>> @@ -1146,7 +1146,6 @@ static bool rcu_nocb_rdp_offload_wait_cond(struct rcu_data *rdp)
>> static int rcu_nocb_rdp_offload(struct rcu_data *rdp)
>> {
>> int wake_gp;
>> - struct rcu_data *rdp_gp = rdp->nocb_gp_rdp;
>>
>> WARN_ON_ONCE(cpu_online(rdp->cpu));
>> /*
>> @@ -1156,7 +1155,7 @@ static int rcu_nocb_rdp_offload(struct rcu_data *rdp)
>> if (!rdp->nocb_gp_rdp)
>> return -EINVAL;
>>
>> - if (WARN_ON_ONCE(!rdp_gp->nocb_gp_kthread))
>> + if (WARN_ON_ONCE(!rdp->nocb_gp_kthread))
>> return -EINVAL;
>>
>> pr_info("Offloading %d\n", rdp->cpu);
>> @@ -1166,7 +1165,7 @@ static int rcu_nocb_rdp_offload(struct rcu_data *rdp)
>>
>> wake_gp = rcu_nocb_queue_toggle_rdp(rdp);
>> if (wake_gp)
>> - wake_up_process(rdp_gp->nocb_gp_kthread);
>> + wake_up_process(rdp->nocb_gp_kthread);
>>
>> swait_event_exclusive(rdp->nocb_state_wq,
>> rcu_nocb_rdp_offload_wait_cond(rdp));
>
Powered by blists - more mailing lists