[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <44dce4fb-6198-4ca3-9535-566655fa8e35@joelfernandes.org>
Date: Wed, 28 Feb 2024 11:44:19 -0500
From: Joel Fernandes <joel@...lfernandes.org>
To: "Uladzislau Rezki (Sony)" <urezki@...il.com>,
"Paul E . McKenney" <paulmck@...nel.org>
Cc: RCU <rcu@...r.kernel.org>, Neeraj upadhyay <Neeraj.Upadhyay@....com>,
Boqun Feng <boqun.feng@...il.com>, Hillf Danton <hdanton@...a.com>,
LKML <linux-kernel@...r.kernel.org>,
Oleksiy Avramchenko <oleksiy.avramchenko@...y.com>,
Frederic Weisbecker <frederic@...nel.org>
Subject: Re: [PATCH v5 2/4] rcu: Reduce synchronize_rcu() latency
On 2/28/2024 9:32 AM, Joel Fernandes wrote:
>
>
> On 2/20/2024 1:31 PM, Uladzislau Rezki (Sony) wrote:
[...]
>> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
>> index c8980d76f402..1328da63c3cd 100644
>> --- a/kernel/rcu/tree.c
>> +++ b/kernel/rcu/tree.c
>> @@ -75,6 +75,7 @@
>> #define MODULE_PARAM_PREFIX "rcutree."
>>
>> /* Data structures. */
>> +static void rcu_sr_normal_gp_cleanup_work(struct work_struct *);
>>
>> static DEFINE_PER_CPU_SHARED_ALIGNED(struct rcu_data, rcu_data) = {
>> .gpwrap = true,
>> @@ -93,6 +94,8 @@ static struct rcu_state rcu_state = {
>> .exp_mutex = __MUTEX_INITIALIZER(rcu_state.exp_mutex),
>> .exp_wake_mutex = __MUTEX_INITIALIZER(rcu_state.exp_wake_mutex),
>> .ofl_lock = __ARCH_SPIN_LOCK_UNLOCKED,
>> + .srs_cleanup_work = __WORK_INITIALIZER(rcu_state.srs_cleanup_work,
>> + rcu_sr_normal_gp_cleanup_work),
>> };
>>
>> /* Dump rcu_node combining tree at boot to verify correct setup. */
>> @@ -1422,6 +1425,282 @@ static void rcu_poll_gp_seq_end_unlocked(unsigned long *snap)
>> raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
>> }
> [..]
>> +static void rcu_sr_normal_add_req(struct rcu_synchronize *rs)
>> +{
>> + llist_add((struct llist_node *) &rs->head, &rcu_state.srs_next);
>> +}
>> +
>
> I'm a bit concerned from a memory order PoV about this llist_add() happening
> possibly on a different CPU than the GP thread, and different than the kworker
> thread. Basically we can have 3 CPUs simultaneously modifying and reading the
> list, but only 2 CPUs have the acq-rel pair AFAICS.
>
> Consider the following situation:
>
> synchronize_rcu() user
> ----------------------
> llist_add the user U - update srs_next list
>
> rcu_gp_init() and rcu_gp_cleanup (SAME THREAD)
> --------------------
> insert dummy node in front of U, call it S
> update wait_tail to U
>
> and then cleanup:
> read wait_tail to W
> set wait_tail to NULL
> set done_tail to W (RELEASE) -- this release ensures U and S are seen by worker.
>
> workqueue handler
> -----------------
> read done_tail (ACQUIRE)
> disconnect rest of list -- disconnected list guaranteed to have U and S,
> if done_tail read was W.
> ---------------------------------
>
> So llist_add() does this (assume new_first and new_last are same):
>
> struct llist_node *first = READ_ONCE(head->first);
>
> do {
> new_last->next = first;
> } while (!try_cmpxchg(&head->first, &first, new_first));
>
> return !first;
> ---
>
> It reads head->first, then writes the new_last->next (call it new_first->next)
> to the old first, then sets head->first to the new_first if head->first did not
> change in the meanwhile.
>
> The problem I guess happens if the update the head->first is seen *after* the
> update to the new_first->next.
>
> This potentially means a corrupted list is seen in the workqueue handler..
> because the "U" node is not yet seen pointing to the rest of the list
> (previously added nodes), but is already seen the head of the list.
>
> I am not sure if this can happen, but AFAIK try_cmpxchg() doesn't imply ordering
> per-se. Maybe that try_cmpxchg() should be a try_cmpxchg_release() in llist_add() ?
Everyone in the internal RCU crew corrected me offline that try_cmpxchg() has
full ordering if the cmpxchg succeeded.
So I don't think the issue I mentioned can occur, So we can park this.
Thanks!
- Joel
Powered by blists - more mailing lists