[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a2b9d79f-5406-422c-aa79-f0fd1862d4ab@nvidia.com>
Date: Sat, 22 Mar 2025 15:20:39 +0100
From: Joel Fernandes <joelagnelf@...dia.com>
To: Frederic Weisbecker <frederic@...nel.org>
Cc: "Paul E. McKenney" <paulmck@...nel.org>,
LKML <linux-kernel@...r.kernel.org>, Boqun Feng <boqun.feng@...il.com>,
Neeraj Upadhyay <neeraj.upadhyay@....com>,
Uladzislau Rezki <urezki@...il.com>, Zqiang <qiang.zhang1211@...il.com>,
rcu <rcu@...r.kernel.org>
Subject: Re: [PATCH 1/2] rcu: Comment on the extraneous delta test on
rcu_seq_done_exact()
On 3/22/2025 11:25 AM, Frederic Weisbecker wrote:
> Le Sat, Mar 22, 2025 at 03:06:08AM +0100, Joel Fernandes a écrit :
>> Insomnia kicked in, so 3 am reply here (Zurich local time) ;-):
>>
>> On 3/20/2025 3:15 PM, Frederic Weisbecker wrote:
>>> Le Wed, Mar 19, 2025 at 03:38:31PM -0400, Joel Fernandes a écrit :
>>>> On Tue, Mar 18, 2025 at 11:37:38AM -0700, Paul E. McKenney wrote:
>>>>> On Tue, Mar 18, 2025 at 02:56:18PM +0100, Frederic Weisbecker wrote:
>>>>>> The numbers used in rcu_seq_done_exact() lack some explanation behind
>>>>>> their magic. Especially after the commit:
>>>>>>
>>>>>> 85aad7cc4178 ("rcu: Fix get_state_synchronize_rcu_full() GP-start detection")
>>>>>>
>>>>>> which reported a subtle issue where a new GP sequence snapshot was taken
>>>>>> on the root node state while a grace period had already been started and
>>>>>> reflected on the global state sequence but not yet on the root node
>>>>>> sequence, making a polling user waiting on a wrong already started grace
>>>>>> period that would ignore freshly online CPUs.
>>>>>>
>>>>>> The fix involved taking the snaphot on the global state sequence and
>>>>>> waiting on the root node sequence. And since a grace period is first
>>>>>> started on the global state and only afterward reflected on the root
>>>>>> node, a snapshot taken on the global state sequence might be two full
>>>>>> grace periods ahead of the root node as in the following example:
>>>>>>
>>>>>> rnp->gp_seq = rcu_state.gp_seq = 0
>>>>>>
>>>>>> CPU 0 CPU 1
>>>>>> ----- -----
>>>>>> // rcu_state.gp_seq = 1
>>>>>> rcu_seq_start(&rcu_state.gp_seq)
>>>>>> // snap = 8
>>>>>> snap = rcu_seq_snap(&rcu_state.gp_seq)
>>>>>> // Two full GP differences
>>>>>> rcu_seq_done_exact(&rnp->gp_seq, snap)
>>>>>> // rnp->gp_seq = 1
>>>>>> WRITE_ONCE(rnp->gp_seq, rcu_state.gp_seq);
>>>>>>
>>>>>> Add a comment about those expectations and to clarify the magic within
>>>>>> the relevant function.
>>>>>>
>>>>>> Signed-off-by: Frederic Weisbecker <frederic@...nel.org>
>>>>> Reviewed-by: Paul E. McKenney <paulmck@...nel.org>
>>>>>
>>>>> But it would of course be good to get reviews from the others.
>>>> I actually don't agree that the magic in the rcu_seq_done_exact() function about the
>>>> ~2 GPs is related to the lag between rcu_state.gp_seq and root rnp->gp_seq,
>>>> because the small lag can just as well survive with the rcu_seq_done()
>>>> function in the above sequence right?
>>>>
>>>> The rcu_seq_done_exact() function on the other hand is more about not being
>>>> stuck in the ULONG_MAX/2 guard band, but to actually get to that, you need a
>>>> wrap around to happen and the delta between "rnp->gp_seq" and "snap" to be at
>>>> least ULONG_MAX/2 AFAIU.
>>>>
>>>> So the only time this magic will matter is if you have a huge delta between
>>>> what is being compared, not just 2 GPs.
>>> You're right, and perhaps I should have made it more specific that my comment
>>> only explains the magic "3" number here, in that if it were "2" instead, there
>>> could be accidents with 2 full GPs difference (which is possible) spuriously
>>> accounted as a wrap around.
>>
>> Ahh, so I guess I get it now and we are both right. The complete picture is - We
>> are trying to handle the case of "very large wrap" around but as a part of that,
>> we don't want to create false-positives for this "snap" case.
>>
>> A "snap" can be atmost (2 * RCU_SEQ_STATE_MASK + 1) away from a gp_seq.
>>
>> That's within "2 GPs" worth of counts (about 8 counts)
>>
>> Taking some numbers:
>>
>> cur_s s delta (s - cur_s)
>> 0 4 4
>> 1 8 7
>> 2 8 6
>> 3 8 5
>> 4 8 4
>> 5 12 7
>>
>> The maximum delta of a snap from actual gp_seq can be (2 * RCU_SEQ_STATE_MASK +
>> 1) which in this case is 7.
>>
>> So we adjust the comparison by adding the ULONG_CMP_LT(cur_s, s - (2 *
>> RCU_SEQ_STATE_MASK + 1)). i.e.
>
> 3, right?
Just to be absolutely sure, are you talking about the value of RCU_SEQ_STATE_MASK ?
That is indeed 3 (RCU_SEQ_STATE_MASK).
But if we're talking about number of GPs, my understanding is a count of 4 is
one GP worth. Per the above table, the delta between gp_seq and is snap is
always a count of 7 (hence less than 2 GPs).
Agreed?
If RCU_SEQ_STATE_MASK was 0x1 instead of 0x11, that is a single bit (or a count
of 2 instead of 4, for a GP), then the table would be:
cur_s s (snap) delta (s - cur_s)
0 2 2
1 4 3
2 4 2
3 6 3
4 6 2
5 8 3
So delta is always <= 3, Or more generally: <= (RCU_SEQ_STATE_MASK * 2) + 1
>> Unless you beat me to it, I may modify your patch for v6.16 augmented with this
>> reasoning ;) (Also since I am also working on adding that forced wrap around
>> test to rcutorture).
>
> Please do, I appreciate!
Great, will do.
>> Also it is still not fully clear to me what the root node has to do with all
>> this in your example, because the rcu_seq_done_exact() needs to be what it is
>> (that is having that 2 GP adjustment) even if the rnp->gp_seq and
>> rcu_state.gp_seq were in sync?
>
> Yes, this is only to explain that the maximum drift between the snap on rsp
> and the current state on root rnp can be at most 2 full GP. And that explain the "3"
> magic in the function. But if they were in sync it's all fine.
Right. But I would argue that, even if things were in sync, its not fine to drop
the magic number. Because you cannot take a gp_seq and its corresponding "snap"
and compare them instantly without the "ULONG_CMP_LT". If you do, you'd get a
false positive. But as you did, we could say that the lag between rnp and rsp's
gp_seq values is also covered by this "ULONG_CMP_LT" expression.
Glad we are discussing this in detail so that once we forget all this in 24
hours, we can go back to the archives ;-) ;-)
thanks,
- Joel
Powered by blists - more mailing lists