linux-kernel - Re: [PATCH v5 2/4] rcu: Reduce synchronize

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <44dce4fb-6198-4ca3-9535-566655fa8e35@joelfernandes.org>
Date: Wed, 28 Feb 2024 11:44:19 -0500
From: Joel Fernandes <joel@...lfernandes.org>
To: "Uladzislau Rezki (Sony)" <urezki@...il.com>,
 "Paul E . McKenney" <paulmck@...nel.org>
Cc: RCU <rcu@...r.kernel.org>, Neeraj upadhyay <Neeraj.Upadhyay@....com>,
 Boqun Feng <boqun.feng@...il.com>, Hillf Danton <hdanton@...a.com>,
 LKML <linux-kernel@...r.kernel.org>,
 Oleksiy Avramchenko <oleksiy.avramchenko@...y.com>,
 Frederic Weisbecker <frederic@...nel.org>
Subject: Re: [PATCH v5 2/4] rcu: Reduce synchronize_rcu() latency

On 2/28/2024 9:32 AM, Joel Fernandes wrote:
> 
> 
> On 2/20/2024 1:31 PM, Uladzislau Rezki (Sony) wrote:
[...]
>> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
>> index c8980d76f402..1328da63c3cd 100644
>> --- a/kernel/rcu/tree.c
>> +++ b/kernel/rcu/tree.c
>> @@ -75,6 +75,7 @@
>>  #define MODULE_PARAM_PREFIX "rcutree."
>>  
>>  /* Data structures. */
>> +static void rcu_sr_normal_gp_cleanup_work(struct work_struct *);
>>  
>>  static DEFINE_PER_CPU_SHARED_ALIGNED(struct rcu_data, rcu_data) = {
>>  	.gpwrap = true,
>> @@ -93,6 +94,8 @@ static struct rcu_state rcu_state = {
>>  	.exp_mutex = __MUTEX_INITIALIZER(rcu_state.exp_mutex),
>>  	.exp_wake_mutex = __MUTEX_INITIALIZER(rcu_state.exp_wake_mutex),
>>  	.ofl_lock = __ARCH_SPIN_LOCK_UNLOCKED,
>> +	.srs_cleanup_work = __WORK_INITIALIZER(rcu_state.srs_cleanup_work,
>> +		rcu_sr_normal_gp_cleanup_work),
>>  };
>>  
>>  /* Dump rcu_node combining tree at boot to verify correct setup. */
>> @@ -1422,6 +1425,282 @@ static void rcu_poll_gp_seq_end_unlocked(unsigned long *snap)
>>  		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
>>  }
> [..]
>> +static void rcu_sr_normal_add_req(struct rcu_synchronize *rs)
>> +{
>> +	llist_add((struct llist_node *) &rs->head, &rcu_state.srs_next);
>> +}
>> +
> 
> I'm a bit concerned from a memory order PoV about this llist_add() happening
> possibly on a different CPU than the GP thread, and different than the kworker
> thread. Basically we can have 3 CPUs simultaneously modifying and reading the
> list, but only 2 CPUs have the acq-rel pair AFAICS.
> 
> Consider the following situation:
> 
> synchronize_rcu() user
> ----------------------
> llist_add the user U - update srs_next list
> 
> rcu_gp_init() and rcu_gp_cleanup (SAME THREAD)
> --------------------
> insert dummy node in front of U, call it S
> update wait_tail to U
> 
> and then cleanup:
> read wait_tail to W
> set wait_tail to NULL
> set done_tail to W (RELEASE) -- this release ensures U and S are seen by worker.
> 
> workqueue handler
> -----------------
> read done_tail (ACQUIRE)
> disconnect rest of list -- disconnected list guaranteed to have U and S,
>                            if done_tail read was W.
> ---------------------------------
> 
> So llist_add() does this (assume new_first and new_last are same):
> 
> 	struct llist_node *first = READ_ONCE(head->first);
> 
> 	do {
> 		new_last->next = first;
> 	} while (!try_cmpxchg(&head->first, &first, new_first));
> 
> 	return !first;
> ---
> 
> It reads head->first, then writes the new_last->next (call it new_first->next)
> to the old first, then sets head->first to the new_first if head->first did not
> change in the meanwhile.
> 
> The problem I guess happens if the update the head->first is seen *after* the
> update to the new_first->next.
> 
> This potentially means a corrupted list is seen in the workqueue handler..
> because the "U" node is not yet seen pointing to the rest of the list
> (previously added nodes), but is already seen the head of the list.
> 
> I am not sure if this can happen, but AFAIK try_cmpxchg() doesn't imply ordering
> per-se. Maybe that try_cmpxchg() should be a try_cmpxchg_release() in llist_add() ?

Everyone in the internal RCU crew corrected me offline that try_cmpxchg() has
full ordering if the cmpxchg succeeded.

So I don't think the issue I mentioned can occur, So we can park this.

Thanks!

 - Joel