lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z-CFnKELSqFotela@pavilion.home>
Date: Sun, 23 Mar 2025 23:05:16 +0100
From: Frederic Weisbecker <frederic@...nel.org>
To: Joel Fernandes <joelagnelf@...dia.com>
Cc: "Paul E. McKenney" <paulmck@...nel.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Boqun Feng <boqun.feng@...il.com>,
	Neeraj Upadhyay <neeraj.upadhyay@....com>,
	Uladzislau Rezki <urezki@...il.com>,
	Zqiang <qiang.zhang1211@...il.com>, rcu <rcu@...r.kernel.org>
Subject: Re: [PATCH 1/2] rcu: Comment on the extraneous delta test on
 rcu_seq_done_exact()

Le Sat, Mar 22, 2025 at 03:40:53PM +0100, Joel Fernandes a écrit :
> 
> 
> On 3/22/2025 3:20 PM, Joel Fernandes wrote:
> > 
> > On 3/22/2025 11:25 AM, Frederic Weisbecker wrote:
> >> Le Sat, Mar 22, 2025 at 03:06:08AM +0100, Joel Fernandes a écrit :
> >>> Insomnia kicked in, so 3 am reply here (Zurich local time) ;-):
> >>>
> >>> On 3/20/2025 3:15 PM, Frederic Weisbecker wrote:
> >>>> Le Wed, Mar 19, 2025 at 03:38:31PM -0400, Joel Fernandes a écrit :
> >>>>> On Tue, Mar 18, 2025 at 11:37:38AM -0700, Paul E. McKenney wrote:
> >>>>>> On Tue, Mar 18, 2025 at 02:56:18PM +0100, Frederic Weisbecker wrote:
> >>>>>>> The numbers used in rcu_seq_done_exact() lack some explanation behind
> >>>>>>> their magic. Especially after the commit:
> >>>>>>>
> >>>>>>>     85aad7cc4178 ("rcu: Fix get_state_synchronize_rcu_full() GP-start detection")
> >>>>>>>
> >>>>>>> which reported a subtle issue where a new GP sequence snapshot was taken
> >>>>>>> on the root node state while a grace period had already been started and
> >>>>>>> reflected on the global state sequence but not yet on the root node
> >>>>>>> sequence, making a polling user waiting on a wrong already started grace
> >>>>>>> period that would ignore freshly online CPUs.
> >>>>>>>
> >>>>>>> The fix involved taking the snaphot on the global state sequence and
> >>>>>>> waiting on the root node sequence. And since a grace period is first
> >>>>>>> started on the global state and only afterward reflected on the root
> >>>>>>> node, a snapshot taken on the global state sequence might be two full
> >>>>>>> grace periods ahead of the root node as in the following example:
> >>>>>>>
> >>>>>>> rnp->gp_seq = rcu_state.gp_seq = 0
> >>>>>>>
> >>>>>>>     CPU 0                                           CPU 1
> >>>>>>>     -----                                           -----
> >>>>>>>     // rcu_state.gp_seq = 1
> >>>>>>>     rcu_seq_start(&rcu_state.gp_seq)
> >>>>>>>                                                     // snap = 8
> >>>>>>>                                                     snap = rcu_seq_snap(&rcu_state.gp_seq)
> >>>>>>>                                                     // Two full GP differences
> >>>>>>>                                                     rcu_seq_done_exact(&rnp->gp_seq, snap)
> >>>>>>>     // rnp->gp_seq = 1
> >>>>>>>     WRITE_ONCE(rnp->gp_seq, rcu_state.gp_seq);
> >>>>>>>
> >>>>>>> Add a comment about those expectations and to clarify the magic within
> >>>>>>> the relevant function.
> >>>>>>>
> >>>>>>> Signed-off-by: Frederic Weisbecker <frederic@...nel.org>
> >>>>>> Reviewed-by: Paul E. McKenney <paulmck@...nel.org>
> >>>>>>
> >>>>>> But it would of course be good to get reviews from the others.
> >>>>> I actually don't agree that the magic in the rcu_seq_done_exact() function about the
> >>>>> ~2 GPs is related to the lag between rcu_state.gp_seq and root rnp->gp_seq,
> >>>>> because the small lag can just as well survive with the rcu_seq_done()
> >>>>> function in the above sequence right?
> >>>>>
> >>>>> The rcu_seq_done_exact() function on the other hand is more about not being
> >>>>> stuck in the ULONG_MAX/2 guard band, but to actually get to that, you need a
> >>>>> wrap around to happen and the delta between "rnp->gp_seq" and "snap" to be at
> >>>>> least ULONG_MAX/2 AFAIU.
> >>>>>
> >>>>> So the only time this magic will matter is if you have a huge delta between
> >>>>> what is being compared, not just 2 GPs.
> >>>> You're right, and perhaps I should have made it more specific that my comment
> >>>> only explains the magic "3" number here, in that if it were "2" instead, there
> >>>> could be accidents with 2 full GPs difference (which is possible) spuriously
> >>>> accounted as a wrap around.
> >>> Ahh, so I guess I get it now and we are both right. The complete picture is - We
> >>> are trying to handle the case of "very large wrap" around but as a part of that,
> >>> we don't want to create false-positives for this "snap" case.
> >>>
> >>> A "snap" can be atmost  (2 * RCU_SEQ_STATE_MASK + 1) away from a gp_seq.
> >>>
> >>> That's within "2 GPs" worth of counts (about 8 counts)
> >>>
> >>> Taking some numbers:
> >>>
> >>> cur_s	s	delta (s - cur_s)
> >>> 0	4	4
> >>> 1	8	7
> >>> 2	8	6
> >>> 3	8	5
> >>> 4	8	4
> >>> 5	12	7
> >>>
> >>> The maximum delta of a snap from actual gp_seq can be (2 * RCU_SEQ_STATE_MASK +
> >>> 1) which in this case is 7.
> >>>
> >>> So we adjust the comparison by adding the  ULONG_CMP_LT(cur_s, s - (2 *
> >>> RCU_SEQ_STATE_MASK + 1)). i.e.
> >> 3, right?
> > Just to be absolutely sure, are you talking about the value of RCU_SEQ_STATE_MASK ?
> > 
> > That is indeed 3 (RCU_SEQ_STATE_MASK).
> > 
> > But if we're talking about number of GPs, my understanding is a count of 4 is
> > one GP worth. Per the above table, the delta between gp_seq and is snap is
> > always a count of 7 (hence less than 2 GPs).
> > 
> > Agreed?
> > 
> > If RCU_SEQ_STATE_MASK was 0x1 instead of 0x11, that is a single bit (or a count
> > of 2 instead of 4, for a GP), then the table would be:
> > 
> >  cur_s	s (snap)	delta (s - cur_s)
> >  0	2		2
> >  1	4		3
> >  2	4		2
> >  3	6		3
> >  4	6		2
> >  5	8		3
> > 
> > So delta is always <= 3,  Or more generally: <= (RCU_SEQ_STATE_MASK * 2) + 1
> 
> Oh man, I am wondering if we are on to a bug here:
> 
> From your example:
> 
>     CPU 0                                           CPU 1
>     -----                                           -----
>     // rcu_state.gp_seq = 1
>     rcu_seq_start(&rcu_state.gp_seq)
>                                       // snap = 8
>                                       snap = rcu_seq_snap(&rcu_state.gp_seq)
>                                       // Two full GP
>                                       rcu_seq_done_exact(&rnp->gp_seq, snap)
> 
> 
> Here, the
> ULONG_CMP_LT(cur_s, s - (2 * RCU_SEQ_STATE_MASK + 1));
> 
> Will be
> ULONG_CMP_LT(0, 8 - (2 * RCU_SEQ_STATE_MASK + 1));
> 
>  = ULONG_CMP_LT(0, 8 - 7)
> 
>  = TRUE.
> 
> Which means rcu_seq_done_exact() will return a false positive saying the GP has
> completed even though it has not.
> 
> I think rcu_seq_done_exact() is off by one and should be doing:
> 
> ULONG_CMP_LT(cur_s, s - (2 * RCU_SEQ_STATE_MASK + 2));
> 
> ?

But it's ULONG_CMP_LT(cur_s, s - (3 * RCU_SEQ_STATE_MASK + 1) now since:

    85aad7cc4178 ("rcu: Fix get_state_synchronize_rcu_full() GP-start detection")

That's 10 so we are good.

However that magic value is arbitrary and doesn't mean much. It should be
like you said. Or rather for clarity:

diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
index 7acf1f36dd6c..e53f0b687a83 100644
--- a/kernel/rcu/rcu.h
+++ b/kernel/rcu/rcu.h
@@ -57,6 +57,10 @@
 /* Low-order bit definition for polled grace-period APIs. */
 #define RCU_GET_STATE_COMPLETED	0x1
 
+/* A complete grace period count */
+#define RCU_SEQ_GP (RCU_SEQ_STATE_MASK + 1)
+
+
 extern int sysctl_sched_rt_runtime;
 
 /*
@@ -169,7 +173,7 @@ static inline bool rcu_seq_done_exact(unsigned long *sp, unsigned long s)
 {
 	unsigned long cur_s = READ_ONCE(*sp);
 
-	return ULONG_CMP_GE(cur_s, s) || ULONG_CMP_LT(cur_s, s - (3 * RCU_SEQ_STATE_MASK + 1));
+	return ULONG_CMP_GE(cur_s, s) || ULONG_CMP_LT(cur_s, s - (2 * RCU_SEQ_GP));
 }
 
 /*


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ