[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z96P_K0kt-FumSjz@pavilion.home>
Date: Sat, 22 Mar 2025 11:25:00 +0100
From: Frederic Weisbecker <frederic@...nel.org>
To: Joel Fernandes <joelagnelf@...dia.com>
Cc: "Paul E. McKenney" <paulmck@...nel.org>,
LKML <linux-kernel@...r.kernel.org>,
Boqun Feng <boqun.feng@...il.com>,
Neeraj Upadhyay <neeraj.upadhyay@....com>,
Uladzislau Rezki <urezki@...il.com>,
Zqiang <qiang.zhang1211@...il.com>, rcu <rcu@...r.kernel.org>
Subject: Re: [PATCH 1/2] rcu: Comment on the extraneous delta test on
rcu_seq_done_exact()
Le Sat, Mar 22, 2025 at 03:06:08AM +0100, Joel Fernandes a écrit :
> Insomnia kicked in, so 3 am reply here (Zurich local time) ;-):
>
> On 3/20/2025 3:15 PM, Frederic Weisbecker wrote:
> > Le Wed, Mar 19, 2025 at 03:38:31PM -0400, Joel Fernandes a écrit :
> >> On Tue, Mar 18, 2025 at 11:37:38AM -0700, Paul E. McKenney wrote:
> >>> On Tue, Mar 18, 2025 at 02:56:18PM +0100, Frederic Weisbecker wrote:
> >>>> The numbers used in rcu_seq_done_exact() lack some explanation behind
> >>>> their magic. Especially after the commit:
> >>>>
> >>>> 85aad7cc4178 ("rcu: Fix get_state_synchronize_rcu_full() GP-start detection")
> >>>>
> >>>> which reported a subtle issue where a new GP sequence snapshot was taken
> >>>> on the root node state while a grace period had already been started and
> >>>> reflected on the global state sequence but not yet on the root node
> >>>> sequence, making a polling user waiting on a wrong already started grace
> >>>> period that would ignore freshly online CPUs.
> >>>>
> >>>> The fix involved taking the snaphot on the global state sequence and
> >>>> waiting on the root node sequence. And since a grace period is first
> >>>> started on the global state and only afterward reflected on the root
> >>>> node, a snapshot taken on the global state sequence might be two full
> >>>> grace periods ahead of the root node as in the following example:
> >>>>
> >>>> rnp->gp_seq = rcu_state.gp_seq = 0
> >>>>
> >>>> CPU 0 CPU 1
> >>>> ----- -----
> >>>> // rcu_state.gp_seq = 1
> >>>> rcu_seq_start(&rcu_state.gp_seq)
> >>>> // snap = 8
> >>>> snap = rcu_seq_snap(&rcu_state.gp_seq)
> >>>> // Two full GP differences
> >>>> rcu_seq_done_exact(&rnp->gp_seq, snap)
> >>>> // rnp->gp_seq = 1
> >>>> WRITE_ONCE(rnp->gp_seq, rcu_state.gp_seq);
> >>>>
> >>>> Add a comment about those expectations and to clarify the magic within
> >>>> the relevant function.
> >>>>
> >>>> Signed-off-by: Frederic Weisbecker <frederic@...nel.org>
> >>> Reviewed-by: Paul E. McKenney <paulmck@...nel.org>
> >>>
> >>> But it would of course be good to get reviews from the others.
> >> I actually don't agree that the magic in the rcu_seq_done_exact() function about the
> >> ~2 GPs is related to the lag between rcu_state.gp_seq and root rnp->gp_seq,
> >> because the small lag can just as well survive with the rcu_seq_done()
> >> function in the above sequence right?
> >>
> >> The rcu_seq_done_exact() function on the other hand is more about not being
> >> stuck in the ULONG_MAX/2 guard band, but to actually get to that, you need a
> >> wrap around to happen and the delta between "rnp->gp_seq" and "snap" to be at
> >> least ULONG_MAX/2 AFAIU.
> >>
> >> So the only time this magic will matter is if you have a huge delta between
> >> what is being compared, not just 2 GPs.
> > You're right, and perhaps I should have made it more specific that my comment
> > only explains the magic "3" number here, in that if it were "2" instead, there
> > could be accidents with 2 full GPs difference (which is possible) spuriously
> > accounted as a wrap around.
>
> Ahh, so I guess I get it now and we are both right. The complete picture is - We
> are trying to handle the case of "very large wrap" around but as a part of that,
> we don't want to create false-positives for this "snap" case.
>
> A "snap" can be atmost (2 * RCU_SEQ_STATE_MASK + 1) away from a gp_seq.
>
> That's within "2 GPs" worth of counts (about 8 counts)
>
> Taking some numbers:
>
> cur_s s delta (s - cur_s)
> 0 4 4
> 1 8 7
> 2 8 6
> 3 8 5
> 4 8 4
> 5 12 7
>
> The maximum delta of a snap from actual gp_seq can be (2 * RCU_SEQ_STATE_MASK +
> 1) which in this case is 7.
>
> So we adjust the comparison by adding the ULONG_CMP_LT(cur_s, s - (2 *
> RCU_SEQ_STATE_MASK + 1)). i.e.
3, right?
>
> after a snap, if we blindly do ULONG_CMP_LT without adjustment, we'll falsely
> conclude that the GP has completed thinking it was due to wrap around, where as
> it is possible we just snapped and got a false positive.
>
> So I think your comment is mostly correct then. But I think it may be better to
> clarify that the reason we need rcu_seq_done_exact() and that ULONG_CMP_LT is
> because we want handle very large wrap around not being stuck in "false
> negative" territory as we would with rcu_seq_done(). But that also means we
> can't break the "snap" usecase to the introduction of ULONG_CMP_LT.
Indeed.
>
> Unless you beat me to it, I may modify your patch for v6.16 augmented with this
> reasoning ;) (Also since I am also working on adding that forced wrap around
> test to rcutorture).
Please do, I appreciate!
>
> Also it is still not fully clear to me what the root node has to do with all
> this in your example, because the rcu_seq_done_exact() needs to be what it is
> (that is having that 2 GP adjustment) even if the rnp->gp_seq and
> rcu_state.gp_seq were in sync?
Yes, this is only to explain that the maximum drift between the snap on rsp
and the current state on root rnp can be at most 2 full GP. And that explain the "3"
magic in the function. But if they were in sync it's all fine.
Thanks.
>
> thanks,
>
> - Joel
>
Powered by blists - more mailing lists