[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ac5d37e5-c209-465b-8f2c-b09a2ff6fb07@nvidia.com>
Date: Wed, 5 Mar 2025 10:37:39 -0500
From: Joel Fernandes <joelagnelf@...dia.com>
To: Boqun Feng <boqun.feng@...il.com>, Uladzislau Rezki <urezki@...il.com>
Cc: "Paul E. McKenney" <paulmck@...nel.org>, RCU <rcu@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>,
Frederic Weisbecker <frederic@...nel.org>,
Cheung Wall <zzqq0103.hey@...il.com>,
Neeraj upadhyay <Neeraj.Upadhyay@....com>,
Joel Fernandes <joel@...lfernandes.org>,
Oleksiy Avramchenko <oleksiy.avramchenko@...y.com>
Subject: Re: [PATCH v4 3/3] rcu: Use _full() API to debug synchronize_rcu()
On 3/4/2025 9:54 PM, Boqun Feng wrote:
> On Tue, Mar 04, 2025 at 11:56:18AM +0100, Uladzislau Rezki wrote:
>> On Tue, Mar 04, 2025 at 11:52:26AM +0100, Uladzislau Rezki wrote:
>>>>> Did I get that right?
>>>>>
>>>>
>>>> Other than I'm unable to follow what do you mean "WH has not been
>>>> injected, so nothing to wait on", maybe because I am missing some
>>>> terminology from you ;-) I think it's a good analysis, thank you!
>>>>
>>>>> I think this is a real bug AFAICS, hoping all the memory barriers are in
>>>>> place to make sure the code reordering also correctly orders the accesses.
>>>>> I'll double check that.
>>>>>
>>>>> I also feel its 'theoretical', because as long as rcu_gp_init() and
>>>>> rcu_gp_cleanup() are properly ordered WRT pre-existing readers, then
>>>>> synchronize_rcu_normal() still waits for pre-existing readers even though its
>>>>> a bit confused about the value of the cookies.
>>>>>
>>>>> For the fix,
>>>>> Reviewed-by: Joel Fernandes (Google) <joel@...lfernandes.org>
>>>>>
>>>>> (If possible, include a Link: to my (this) post so that the sequence of
>>>>> events is further clarified.)
>>>>>
>>>>
>>>> Will add the tag (with the email you really want ;-)) and a link to this
>>>> email to the patch. Thanks!
>>>>
>>>
>>> CPU_1: | CPU_2:
>>> # Increase a seq-number |
>>> rcu_seq_start(&rcu_state.gp_seq); |
>>> | add_client() {
>>> | # Record a gp-sec state
>>> | get_state_synchronize_rcu_full(&rs.oldstate);
>>> | }
>>> |
>>> | rcu_sr_normal_gp_init() {
>>> | add a dummy-wait-head;
>>> | }
>>>
>>>
>>> A client has been added with already updated gp-sec number, i.e.
>>> "oldstate" would refer to this GP, not to previous. A poll_state_synchronize_rcu_full()
>>> will complain because this GP is not passed, it will on a next iteration.
>>>
>>> This is how i see this.
>>>
>> Updated the plain-text, removed tabs:
>>
>> CPU_1: | CPU_2:
>> # Increase a seq-number |
>> rcu_seq_start(&rcu_state.gp_seq); |
>> | add_client() {
>> | # Record a gp-sec state
>> | get_state_synchronize_rcu_full(&rs.oldstate);
>> | }
>> |
>> | rcu_sr_normal_gp_init() {
>> | add a dummy-wait-head;
>> | }
>>
>
> Thank you. I added links from you and Joel as the detailed explanation
> to the commit log, and the comment I proposed[1].
>
> [1]: https://lore.kernel.org/rcu/Z8SnhS_LnzN_wvxr@tardis/
>
Yep, I am in line with Vlad's explanation as well, and add links to both
explanations sounds perfect, thanks!
- Joel
Powered by blists - more mailing lists