linux-kernel - Re: [PATCH 1/2] rcu: Comment on the extraneous delta test on rcu_seq_done

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z96P_K0kt-FumSjz@pavilion.home>
Date: Sat, 22 Mar 2025 11:25:00 +0100
From: Frederic Weisbecker <frederic@...nel.org>
To: Joel Fernandes <joelagnelf@...dia.com>
Cc: "Paul E. McKenney" <paulmck@...nel.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Boqun Feng <boqun.feng@...il.com>,
	Neeraj Upadhyay <neeraj.upadhyay@....com>,
	Uladzislau Rezki <urezki@...il.com>,
	Zqiang <qiang.zhang1211@...il.com>, rcu <rcu@...r.kernel.org>
Subject: Re: [PATCH 1/2] rcu: Comment on the extraneous delta test on
 rcu_seq_done_exact()

Le Sat, Mar 22, 2025 at 03:06:08AM +0100, Joel Fernandes a écrit :
> Insomnia kicked in, so 3 am reply here (Zurich local time) ;-):
> 
> On 3/20/2025 3:15 PM, Frederic Weisbecker wrote:
> > Le Wed, Mar 19, 2025 at 03:38:31PM -0400, Joel Fernandes a écrit :
> >> On Tue, Mar 18, 2025 at 11:37:38AM -0700, Paul E. McKenney wrote:
> >>> On Tue, Mar 18, 2025 at 02:56:18PM +0100, Frederic Weisbecker wrote:
> >>>> The numbers used in rcu_seq_done_exact() lack some explanation behind
> >>>> their magic. Especially after the commit:
> >>>>
> >>>>     85aad7cc4178 ("rcu: Fix get_state_synchronize_rcu_full() GP-start detection")
> >>>>
> >>>> which reported a subtle issue where a new GP sequence snapshot was taken
> >>>> on the root node state while a grace period had already been started and
> >>>> reflected on the global state sequence but not yet on the root node
> >>>> sequence, making a polling user waiting on a wrong already started grace
> >>>> period that would ignore freshly online CPUs.
> >>>>
> >>>> The fix involved taking the snaphot on the global state sequence and
> >>>> waiting on the root node sequence. And since a grace period is first
> >>>> started on the global state and only afterward reflected on the root
> >>>> node, a snapshot taken on the global state sequence might be two full
> >>>> grace periods ahead of the root node as in the following example:
> >>>>
> >>>> rnp->gp_seq = rcu_state.gp_seq = 0
> >>>>
> >>>>     CPU 0                                           CPU 1
> >>>>     -----                                           -----
> >>>>     // rcu_state.gp_seq = 1
> >>>>     rcu_seq_start(&rcu_state.gp_seq)
> >>>>                                                     // snap = 8
> >>>>                                                     snap = rcu_seq_snap(&rcu_state.gp_seq)
> >>>>                                                     // Two full GP differences
> >>>>                                                     rcu_seq_done_exact(&rnp->gp_seq, snap)
> >>>>     // rnp->gp_seq = 1
> >>>>     WRITE_ONCE(rnp->gp_seq, rcu_state.gp_seq);
> >>>>
> >>>> Add a comment about those expectations and to clarify the magic within
> >>>> the relevant function.
> >>>>
> >>>> Signed-off-by: Frederic Weisbecker <frederic@...nel.org>
> >>> Reviewed-by: Paul E. McKenney <paulmck@...nel.org>
> >>>
> >>> But it would of course be good to get reviews from the others.
> >> I actually don't agree that the magic in the rcu_seq_done_exact() function about the
> >> ~2 GPs is related to the lag between rcu_state.gp_seq and root rnp->gp_seq,
> >> because the small lag can just as well survive with the rcu_seq_done()
> >> function in the above sequence right?
> >>
> >> The rcu_seq_done_exact() function on the other hand is more about not being
> >> stuck in the ULONG_MAX/2 guard band, but to actually get to that, you need a
> >> wrap around to happen and the delta between "rnp->gp_seq" and "snap" to be at
> >> least ULONG_MAX/2 AFAIU.
> >>
> >> So the only time this magic will matter is if you have a huge delta between
> >> what is being compared, not just 2 GPs.
> > You're right, and perhaps I should have made it more specific that my comment
> > only explains the magic "3" number here, in that if it were "2" instead, there
> > could be accidents with 2 full GPs difference (which is possible) spuriously
> > accounted as a wrap around.
> 
> Ahh, so I guess I get it now and we are both right. The complete picture is - We
> are trying to handle the case of "very large wrap" around but as a part of that,
> we don't want to create false-positives for this "snap" case.
> 
> A "snap" can be atmost  (2 * RCU_SEQ_STATE_MASK + 1) away from a gp_seq.
> 
> That's within "2 GPs" worth of counts (about 8 counts)
> 
> Taking some numbers:
> 
> cur_s	s	delta (s - cur_s)
> 0	4	4
> 1	8	7
> 2	8	6
> 3	8	5
> 4	8	4
> 5	12	7
> 
> The maximum delta of a snap from actual gp_seq can be (2 * RCU_SEQ_STATE_MASK +
> 1) which in this case is 7.
> 
> So we adjust the comparison by adding the  ULONG_CMP_LT(cur_s, s - (2 *
> RCU_SEQ_STATE_MASK + 1)). i.e.

3, right?

> 
> after a snap, if we blindly do ULONG_CMP_LT without adjustment, we'll falsely
> conclude that the GP has completed thinking it was due to wrap around, where as
> it is possible we just snapped and got a false positive.
> 
> So I think your comment is mostly correct then. But I think it may be better to
> clarify that the reason we need rcu_seq_done_exact() and that ULONG_CMP_LT is
> because we want handle very large wrap around not being stuck in "false
> negative" territory as we would with rcu_seq_done(). But that also means we
> can't break the "snap" usecase to the introduction of ULONG_CMP_LT.

Indeed.

> 
> Unless you beat me to it, I may modify your patch for v6.16 augmented with this
> reasoning ;) (Also since I am also working on adding that forced wrap around
> test to rcutorture).

Please do, I appreciate!

> 
> Also it is still not fully clear to me what the root node has to do with all
> this in your example, because the rcu_seq_done_exact() needs to be what it is
> (that is having that 2 GP adjustment) even if the rnp->gp_seq and
> rcu_state.gp_seq were in sync?

Yes, this is only to explain that the maximum drift between the snap on rsp
and the current state on root rnp can be at most 2 full GP. And that explain the "3"
magic in the function. But if they were in sync it's all fine.

Thanks.


> 
> thanks,
> 
>  - Joel
>