[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <tencent_2E076343DA2D43D9B05F9DE3A01E0ACF2407@qq.com>
Date: Tue, 5 Mar 2024 11:30:18 +0800
From: linke <lilinke99@...com>
To: paulmck@...nel.org
Cc: Joel Fernandes <joel@...lfernandes.org>,
Davidlohr Bueso <dave@...olabs.net>,
Josh Triplett <josh@...htriplett.org>,
Frederic Weisbecker <frederic@...nel.org>,
Neeraj Upadhyay <quic_neeraju@...cinc.com>,
Boqun Feng <boqun.feng@...il.com>,
Steven Rostedt <rostedt@...dmis.org>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Lai Jiangshan <jiangshanlai@...il.com>,
Zqiang <qiang.zhang1211@...il.com>,
linux-kernel@...r.kernel.org,
rcu@...r.kernel.org
Subject: Re: [PATCH] rcutorture: Fix
rcu_torture_pipe_update_one()/rcu_torture_writer() data race and concurrency
bug
Thank you both. This looks good to me. I will send a new patch.
> 2024年3月5日 04:47,Paul E. McKenney <paulmck@...nel.org> 写道:
>
> On Mon, Mar 04, 2024 at 03:13:10PM -0500, Joel Fernandes wrote:
>>
>>
>> On 3/4/2024 2:44 PM, Paul E. McKenney wrote:
>>> On Mon, Mar 04, 2024 at 02:10:09PM -0500, Joel Fernandes wrote:
>>>>
>>>>
>>>> On 3/4/2024 12:14 PM, Paul E. McKenney wrote:
>>>>> On Mon, Mar 04, 2024 at 11:19:21AM -0500, Joel Fernandes wrote:
>>>>>>
>>>>>>
>>>>>> On 3/4/2024 5:54 AM, linke li wrote:
>>>>>>> Some changes are done to fix a data race in commit 202489101f2e ("rcutorture: Fix rcu_torture_one_read()/rcu_torture_writer() data race")
>>>>>>>
>>>>>>> {
>>>>>>> int i;
>>>>>>>
>>>>>>> - i = rp->rtort_pipe_count;
>>>>>>> + i = READ_ONCE(rp->rtort_pipe_count);
>>>>>>> if (i > RCU_TORTURE_PIPE_LEN)
>>>>>>> i = RCU_TORTURE_PIPE_LEN;
>>>>>>> atomic_inc(&rcu_torture_wcount[i]);
>>>>>>> - if (++rp->rtort_pipe_count >= RCU_TORTURE_PIPE_LEN) {
>>>>>>> + WRITE_ONCE(rp->rtort_pipe_count, i + 1);
>>>>>>> + if (rp->rtort_pipe_count >= RCU_TORTURE_PIPE_LEN) {
>>>>>>> rp->rtort_mbtest = 0;
>>>>>>> return true;
>>>>>>> }
>>>>>>>
>>>>>>> But ++rp->rtort_pipe_count is meant to add itself by 1, not give i+1 to
>>>>>>> rp->rtort_pipe_count, because rp->rtort_pipe_count may write by
>>>>>>> rcu_torture_writer() concurrently.
>>>>>>>
>>>>>>> Also, rp->rtort_pipe_count in the next line should be read using
>>>>>>> READ_ONCE() because of data race.
>>>>>>>
>>>>>>> Signed-off-by: linke li <lilinke99@...com>
>>>>>>> ---
>>>>>>> kernel/rcu/rcutorture.c | 4 ++--
>>>>>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
>>>>>>> index 7567ca8e743c..00059ace4fd5 100644
>>>>>>> --- a/kernel/rcu/rcutorture.c
>>>>>>> +++ b/kernel/rcu/rcutorture.c
>>>>>>> @@ -465,8 +465,8 @@ rcu_torture_pipe_update_one(struct rcu_torture *rp)
>>>>>>> if (i > RCU_TORTURE_PIPE_LEN)
>>>>>>> i = RCU_TORTURE_PIPE_LEN;
>>>>>>> atomic_inc(&rcu_torture_wcount[i]);
>>>>>>> - WRITE_ONCE(rp->rtort_pipe_count, i + 1);
>>>>>>> - if (rp->rtort_pipe_count >= RCU_TORTURE_PIPE_LEN) {
>>>>>>> + WRITE_ONCE(rp->rtort_pipe_count, rp->rtort_pipe_count + 1);
>>>>>>> + if (READ_ONCE(rp->rtort_pipe_count) >= RCU_TORTURE_PIPE_LEN) {
>>>>>>
>>>>>> I want to say, I am not convinced with the patch because what's wrong with
>>>>>> writing to an old index?
>>>>>>
>>>>>> You win/lose the race anyway, say the CPU executed the WRITE_ONCE() a bit too
>>>>>> early/late and another WRITE_ONCE() lost/won, regardless of whether you wrote
>>>>>> the "incremented i" or "the increment from the latest value of pipe_count".
>>>>>>
>>>>>> Anyway, a slightly related/different question:
>>>>>>
>>>>>> Should that:
>>>>>> WRITE_ONCE(rp->rtort_pipe_count, rp->rtort_pipe_count + 1);
>>>>>>
>>>>>> Be:
>>>>>> WRITE_ONCE(rp->rtort_pipe_count, READ_ONCE(rp->rtort_pipe_count) + 1);
>>>>>>
>>>>>> ?
>>>>>
>>>>> Thank you both!
>>>>>
>>>>> At first glance, I would argue for something like this:
>>>>>
>>>>> ------------------------------------------------------------------------
>>>>>
>>>>> static bool
>>>>> rcu_torture_pipe_update_one(struct rcu_torture *rp)
>>>>> {
>>>>> int i;
>>>>> struct rcu_torture_reader_check *rtrcp = READ_ONCE(rp->rtort_chkp);
>>>>>
>>>>> if (rtrcp) {
>>>>> WRITE_ONCE(rp->rtort_chkp, NULL);
>>>>> smp_store_release(&rtrcp->rtc_ready, 1); // Pair with smp_load_acquire().
>>>>> }
>>>>> i = READ_ONCE(rp->rtort_pipe_count) + 1;
>>>>> if (i > RCU_TORTURE_PIPE_LEN)
>>>>> i = RCU_TORTURE_PIPE_LEN;
>>>>> atomic_inc(&rcu_torture_wcount[i]);
>>>>> WRITE_ONCE(rp->rtort_pipe_count, i);
>>>>> if (i >= RCU_TORTURE_PIPE_LEN) {
>>>>> rp->rtort_mbtest = 0;
>>>>> return true;
>>>>> }
>>>>> return false;
>>>>> }
>>>>>
>>>>> ------------------------------------------------------------------------
>>>>>
>>>>> That is, move the increment to the read and replace the re-read with
>>>>> the value "i" that was just written.
>>>>
>>>> But that changes the original logic as well? It looks like with the above
>>>> change, you're now writing to rcu_torture_wcount[READ_ONCE(rp->rtort_pipe_count)
>>>> + 1] instead of rcu_torture_wcount[READ_ONCE(rp->rtort_pipe_count)].
>>>>
>>>> I think that might break rcutorture, because there is an increment outside of
>>>> the first 2 entries in rcu_torture_wcount but not sure (need to look more).
>>>
>>> Good point on never incrementing the zeroth entry! Clearly I should
>>> have waited before replying.
>>>
>>> How about the following?
>>>
>>> ------------------------------------------------------------------------
>>>
>>> static bool
>>> rcu_torture_pipe_update_one(struct rcu_torture *rp)
>>> {
>>> int i;
>>> struct rcu_torture_reader_check *rtrcp = READ_ONCE(rp->rtort_chkp);
>>>
>>> if (rtrcp) {
>>> WRITE_ONCE(rp->rtort_chkp, NULL);
>>> smp_store_release(&rtrcp->rtc_ready, 1); // Pair with smp_load_acquire().
>>> }
>>> i = READ_ONCE(rp->rtort_pipe_count);
>>> if (i > RCU_TORTURE_PIPE_LEN)
>>> i = RCU_TORTURE_PIPE_LEN;
>>> atomic_inc(&rcu_torture_wcount[i]);
>>> WRITE_ONCE(rp->rtort_pipe_count, i + 1);
>>> if (i + 1 >= RCU_TORTURE_PIPE_LEN) {
>>> rp->rtort_mbtest = 0;
>>> return true;
>>> }
>>> return false;
>>> }
>>
>> Yes, this looks good to me. Thanks,
>> Reviewed-by: Joel Fernandes (Google) <joel@...lfernandes.org>
>
> Again, thank you.
>
> linke li, does this approach work for you? If so, would you be willing to
> send a new patch along these lines? If it does not work, what additional
> problems do you see?
>
> Thanx, Paul
Powered by blists - more mailing lists