lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3c7f1032-f2ba-4fc6-91c0-a07fce1b9c3c@nvidia.com>
Date: Sat, 22 Mar 2025 03:06:08 +0100
From: Joel Fernandes <joelagnelf@...dia.com>
To: Frederic Weisbecker <frederic@...nel.org>
Cc: "Paul E. McKenney" <paulmck@...nel.org>,
 LKML <linux-kernel@...r.kernel.org>, Boqun Feng <boqun.feng@...il.com>,
 Neeraj Upadhyay <neeraj.upadhyay@....com>,
 Uladzislau Rezki <urezki@...il.com>, Zqiang <qiang.zhang1211@...il.com>,
 rcu <rcu@...r.kernel.org>
Subject: Re: [PATCH 1/2] rcu: Comment on the extraneous delta test on
 rcu_seq_done_exact()

Insomnia kicked in, so 3 am reply here (Zurich local time) ;-):

On 3/20/2025 3:15 PM, Frederic Weisbecker wrote:
> Le Wed, Mar 19, 2025 at 03:38:31PM -0400, Joel Fernandes a écrit :
>> On Tue, Mar 18, 2025 at 11:37:38AM -0700, Paul E. McKenney wrote:
>>> On Tue, Mar 18, 2025 at 02:56:18PM +0100, Frederic Weisbecker wrote:
>>>> The numbers used in rcu_seq_done_exact() lack some explanation behind
>>>> their magic. Especially after the commit:
>>>>
>>>>     85aad7cc4178 ("rcu: Fix get_state_synchronize_rcu_full() GP-start detection")
>>>>
>>>> which reported a subtle issue where a new GP sequence snapshot was taken
>>>> on the root node state while a grace period had already been started and
>>>> reflected on the global state sequence but not yet on the root node
>>>> sequence, making a polling user waiting on a wrong already started grace
>>>> period that would ignore freshly online CPUs.
>>>>
>>>> The fix involved taking the snaphot on the global state sequence and
>>>> waiting on the root node sequence. And since a grace period is first
>>>> started on the global state and only afterward reflected on the root
>>>> node, a snapshot taken on the global state sequence might be two full
>>>> grace periods ahead of the root node as in the following example:
>>>>
>>>> rnp->gp_seq = rcu_state.gp_seq = 0
>>>>
>>>>     CPU 0                                           CPU 1
>>>>     -----                                           -----
>>>>     // rcu_state.gp_seq = 1
>>>>     rcu_seq_start(&rcu_state.gp_seq)
>>>>                                                     // snap = 8
>>>>                                                     snap = rcu_seq_snap(&rcu_state.gp_seq)
>>>>                                                     // Two full GP differences
>>>>                                                     rcu_seq_done_exact(&rnp->gp_seq, snap)
>>>>     // rnp->gp_seq = 1
>>>>     WRITE_ONCE(rnp->gp_seq, rcu_state.gp_seq);
>>>>
>>>> Add a comment about those expectations and to clarify the magic within
>>>> the relevant function.
>>>>
>>>> Signed-off-by: Frederic Weisbecker <frederic@...nel.org>
>>> Reviewed-by: Paul E. McKenney <paulmck@...nel.org>
>>>
>>> But it would of course be good to get reviews from the others.
>> I actually don't agree that the magic in the rcu_seq_done_exact() function about the
>> ~2 GPs is related to the lag between rcu_state.gp_seq and root rnp->gp_seq,
>> because the small lag can just as well survive with the rcu_seq_done()
>> function in the above sequence right?
>>
>> The rcu_seq_done_exact() function on the other hand is more about not being
>> stuck in the ULONG_MAX/2 guard band, but to actually get to that, you need a
>> wrap around to happen and the delta between "rnp->gp_seq" and "snap" to be at
>> least ULONG_MAX/2 AFAIU.
>>
>> So the only time this magic will matter is if you have a huge delta between
>> what is being compared, not just 2 GPs.
> You're right, and perhaps I should have made it more specific that my comment
> only explains the magic "3" number here, in that if it were "2" instead, there
> could be accidents with 2 full GPs difference (which is possible) spuriously
> accounted as a wrap around.

Ahh, so I guess I get it now and we are both right. The complete picture is - We
are trying to handle the case of "very large wrap" around but as a part of that,
we don't want to create false-positives for this "snap" case.

A "snap" can be atmost  (2 * RCU_SEQ_STATE_MASK + 1) away from a gp_seq.

That's within "2 GPs" worth of counts (about 8 counts)

Taking some numbers:

cur_s	s	delta (s - cur_s)
0	4	4
1	8	7
2	8	6
3	8	5
4	8	4
5	12	7

The maximum delta of a snap from actual gp_seq can be (2 * RCU_SEQ_STATE_MASK +
1) which in this case is 7.

So we adjust the comparison by adding the  ULONG_CMP_LT(cur_s, s - (2 *
RCU_SEQ_STATE_MASK + 1)). i.e.

after a snap, if we blindly do ULONG_CMP_LT without adjustment, we'll falsely
conclude that the GP has completed thinking it was due to wrap around, where as
it is possible we just snapped and got a false positive.

So I think your comment is mostly correct then. But I think it may be better to
clarify that the reason we need rcu_seq_done_exact() and that ULONG_CMP_LT is
because we want handle very large wrap around not being stuck in "false
negative" territory as we would with rcu_seq_done(). But that also means we
can't break the "snap" usecase to the introduction of ULONG_CMP_LT.

Unless you beat me to it, I may modify your patch for v6.16 augmented with this
reasoning ;) (Also since I am also working on adding that forced wrap around
test to rcutorture).

Also it is still not fully clear to me what the root node has to do with all
this in your example, because the rcu_seq_done_exact() needs to be what it is
(that is having that 2 GP adjustment) even if the rnp->gp_seq and
rcu_state.gp_seq were in sync?

thanks,

 - Joel


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ