linux-kernel - Re: [PATCH rcu 04/12] rcu: Switch polled grace-period APIs to ->gp_seq

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220721015643.GA3791281@paulmck-ThinkPad-P17-Gen-1>
Date:   Wed, 20 Jul 2022 18:56:43 -0700
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Boqun Feng <boqun.feng@...il.com>
Cc:     rcu@...r.kernel.org, linux-kernel@...r.kernel.org,
        kernel-team@...com, rostedt@...dmis.org,
        Brian Foster <bfoster@...hat.com>,
        Dave Chinner <david@...morbit.com>,
        Al Viro <viro@...iv.linux.org.uk>, Ian Kent <raven@...maw.net>
Subject: Re: [PATCH rcu 04/12] rcu: Switch polled grace-period APIs to
 ->gp_seq_polled

On Wed, Jul 20, 2022 at 06:04:55PM -0700, Paul E. McKenney wrote:
> On Wed, Jul 20, 2022 at 05:53:38PM -0700, Boqun Feng wrote:
> > Hi Paul,
> > 
> > On Mon, Jun 20, 2022 at 03:51:20PM -0700, Paul E. McKenney wrote:
> > > This commit switches the existing polled grace-period APIs to use a
> > > new ->gp_seq_polled counter in the rcu_state structure.  An additional
> > > ->gp_seq_polled_snap counter in that same structure allows the normal
> > > grace period kthread to interact properly with the !SMP !PREEMPT fastpath
> > > through synchronize_rcu().  The first of the two to note the end of a
> > > given grace period will make knowledge of this transition available to
> > > the polled API.
> > > 
> > > This commit is in preparation for polled expedited grace periods.
> > > 
> > > Link: https://lore.kernel.org/all/20220121142454.1994916-1-bfoster@redhat.com/
> > > Link: https://docs.google.com/document/d/1RNKWW9jQyfjxw2E8dsXVTdvZYh0HnYeSHDKog9jhdN8/edit?usp=sharing
> > > Cc: Brian Foster <bfoster@...hat.com>
> > > Cc: Dave Chinner <david@...morbit.com>
> > > Cc: Al Viro <viro@...iv.linux.org.uk>
> > > Cc: Ian Kent <raven@...maw.net>
> > > Signed-off-by: Paul E. McKenney <paulmck@...nel.org>
> > > ---
> > >  kernel/rcu/tree.c | 90 +++++++++++++++++++++++++++++++++++++++++++++--
> > >  kernel/rcu/tree.h |  2 ++
> > >  2 files changed, 89 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index 46cfceea87847..637e8f9454573 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -1775,6 +1775,78 @@ static void rcu_strict_gp_boundary(void *unused)
> > >  	invoke_rcu_core();
> > >  }
> > >  
> > > +// Has rcu_init() been invoked?  This is used (for example) to determine
> > > +// whether spinlocks may be acquired safely.
> > > +static bool rcu_init_invoked(void)
> > > +{
> > > +	return !!rcu_state.n_online_cpus;
> > > +}
> > > +
> > > +// Make the polled API aware of the beginning of a grace period.
> > > +static void rcu_poll_gp_seq_start(unsigned long *snap)
> > > +{
> > > +	struct rcu_node *rnp = rcu_get_root();
> > > +
> > > +	if (rcu_init_invoked())
> > > +		raw_lockdep_assert_held_rcu_node(rnp);
> > > +
> > > +	// If RCU was idle, note beginning of GP.
> > > +	if (!rcu_seq_state(rcu_state.gp_seq_polled))
> > > +		rcu_seq_start(&rcu_state.gp_seq_polled);
> > > +
> > > +	// Either way, record current state.
> > > +	*snap = rcu_state.gp_seq_polled;
> > > +}
> > > +
> > > +// Make the polled API aware of the end of a grace period.
> > > +static void rcu_poll_gp_seq_end(unsigned long *snap)
> > > +{
> > > +	struct rcu_node *rnp = rcu_get_root();
> > > +
> > > +	if (rcu_init_invoked())
> > > +		raw_lockdep_assert_held_rcu_node(rnp);
> > > +
> > > +	// If the the previously noted GP is still in effect, record the
> > > +	// end of that GP.  Either way, zero counter to avoid counter-wrap
> > > +	// problems.
> > > +	if (*snap && *snap == rcu_state.gp_seq_polled) {
> > > +		rcu_seq_end(&rcu_state.gp_seq_polled);
> > > +		rcu_state.gp_seq_polled_snap = 0;
> > > +	} else {
> > > +		*snap = 0;
> > > +	}
> > > +}
> > > +
> > > +// Make the polled API aware of the beginning of a grace period, but
> > > +// where caller does not hold the root rcu_node structure's lock.
> > > +static void rcu_poll_gp_seq_start_unlocked(unsigned long *snap)
> > > +{
> > > +	struct rcu_node *rnp = rcu_get_root();
> > > +
> > > +	if (rcu_init_invoked()) {
> > > +		lockdep_assert_irqs_enabled();
> > > +		raw_spin_lock_irq_rcu_node(rnp);
> > > +	}
> > > +	rcu_poll_gp_seq_start(snap);
> > > +	if (rcu_init_invoked())
> > > +		raw_spin_unlock_irq_rcu_node(rnp);
> > > +}
> > > +
> > > +// Make the polled API aware of the end of a grace period, but where
> > > +// caller does not hold the root rcu_node structure's lock.
> > > +static void rcu_poll_gp_seq_end_unlocked(unsigned long *snap)
> > > +{
> > > +	struct rcu_node *rnp = rcu_get_root();
> > > +
> > > +	if (rcu_init_invoked()) {
> > > +		lockdep_assert_irqs_enabled();
> > > +		raw_spin_lock_irq_rcu_node(rnp);
> > > +	}
> > > +	rcu_poll_gp_seq_end(snap);
> > > +	if (rcu_init_invoked())
> > > +		raw_spin_unlock_irq_rcu_node(rnp);
> > > +}
> > > +
> > >  /*
> > >   * Initialize a new grace period.  Return false if no grace period required.
> > >   */
> > > @@ -1810,6 +1882,7 @@ static noinline_for_stack bool rcu_gp_init(void)
> > >  	rcu_seq_start(&rcu_state.gp_seq);
> > >  	ASSERT_EXCLUSIVE_WRITER(rcu_state.gp_seq);
> > >  	trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq, TPS("start"));
> > > +	rcu_poll_gp_seq_start(&rcu_state.gp_seq_polled_snap);
> > >  	raw_spin_unlock_irq_rcu_node(rnp);
> > >  
> > >  	/*
> > > @@ -2069,6 +2142,7 @@ static noinline void rcu_gp_cleanup(void)
> > >  	 * safe for us to drop the lock in order to mark the grace
> > >  	 * period as completed in all of the rcu_node structures.
> > >  	 */
> > > +	rcu_poll_gp_seq_end(&rcu_state.gp_seq_polled_snap);
> > >  	raw_spin_unlock_irq_rcu_node(rnp);
> > >  
> > >  	/*
> > > @@ -3837,8 +3911,18 @@ void synchronize_rcu(void)
> > >  			 lock_is_held(&rcu_lock_map) ||
> > >  			 lock_is_held(&rcu_sched_lock_map),
> > >  			 "Illegal synchronize_rcu() in RCU read-side critical section");
> > > -	if (rcu_blocking_is_gp())
> > > +	if (rcu_blocking_is_gp()) {
> > > +		// Note well that this code runs with !PREEMPT && !SMP.
> > > +		// In addition, all code that advances grace periods runs
> > > +		// at process level.  Therefore, this GP overlaps with other
> > > +		// GPs only by being fully nested within them, which allows
> > > +		// reuse of ->gp_seq_polled_snap.
> > > +		rcu_poll_gp_seq_start_unlocked(&rcu_state.gp_seq_polled_snap);
> > > +		rcu_poll_gp_seq_end_unlocked(&rcu_state.gp_seq_polled_snap);
> > > +		if (rcu_init_invoked())
> > > +			cond_resched_tasks_rcu_qs();
> > >  		return;  // Context allows vacuous grace periods.
> > > +	}
> > >  	if (rcu_gp_is_expedited())
> > >  		synchronize_rcu_expedited();
> > >  	else
> > > @@ -3860,7 +3944,7 @@ unsigned long get_state_synchronize_rcu(void)
> > >  	 * before the load from ->gp_seq.
> > >  	 */
> > >  	smp_mb();  /* ^^^ */
> > > -	return rcu_seq_snap(&rcu_state.gp_seq);
> > > +	return rcu_seq_snap(&rcu_state.gp_seq_polled);
> > 
> > I happened to run into this. There is one usage of
> > get_state_synchronize_rcu() in start_poll_synchronize_rcu(), in which
> > the return value of get_state_synchronize_rcu() ("gp_seq") will be used
> > for rcu_start_this_gp(). I don't think this is quite right, because
> > after this change, rcu_state.gp_seq and rcu_state.gp_seq_polled are
> > different values, in fact ->gp_seq_polled is greater than ->gp_seq
> > by how many synchronize_rcu() is called in early boot.
> > 
> > Am I missing something here?
> 
> It does not appear that your are missing anything, sad to say!
> 
> Does the following make it work better?

Well, rcutorture doesn't like this change much.  ;-)

No surprise, given that it is only the value feeding into
rcu_start_this_gp() that needs to change, not the value returned from
start_poll_synchronize_rcu().

Take 2, still untested.

							Thanx, Paul

------------------------------------------------------------------------

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 2122359f0c862..061c1f6737ddc 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3581,7 +3581,7 @@ unsigned long start_poll_synchronize_rcu(void)
 	rdp = this_cpu_ptr(&rcu_data);
 	rnp = rdp->mynode;
 	raw_spin_lock_rcu_node(rnp); // irqs already disabled.
-	needwake = rcu_start_this_gp(rnp, rdp, gp_seq);
+	needwake = rcu_start_this_gp(rnp, rdp, rcu_seq_snap(&rcu_state.gp_seq));
 	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 	if (needwake)
 		rcu_gp_kthread_wake();