linux-kernel - Re: [RFC v2 4/5] rcu: Use for_each_leaf_node_cpu() in force_qs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 21 Dec 2016 08:48:45 -0800
From:   "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:     Boqun Feng <boqun.feng@...il.com>
Cc:     Colin Ian King <colin.king@...onical.com>,
        Mark Rutland <mark.rutland@....com>,
        linux-kernel@...r.kernel.org,
        Josh Triplett <josh@...htriplett.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Lai Jiangshan <jiangshanlai@...il.com>
Subject: Re: [RFC v2 4/5] rcu: Use for_each_leaf_node_cpu() in force_qs_rnp()

On Wed, Dec 21, 2016 at 12:18:08PM +0800, Boqun Feng wrote:
> On Tue, Dec 20, 2016 at 07:40:24PM -0800, Paul E. McKenney wrote:
> [...]
> > > 
> > > Agreed, my intent is to keep this overcare check for couples of releases
> > > and if no one shoots his/her foot, we can remove it, if not, it
> > > definitely means this part is subtle, and we need to pay more attention
> > > to it, maybe write some regression tests for this particular problem to
> > > help developers avoid it.
> > > 
> > > This check is supposed to be removed, so I'm not stick to keeping it.
> > 
> > I suggest keeping through validation.  If it triggers during that time,
> > consider keeping it longer.  If it does not trigger, remove it before
> > it goes upstream.
> 
> Good point ;-)
> 
> [...]
> > > > > 
> > > > > But this brings a side question, is the callsite of rcu_cpu_starting()
> > > > > is correct? Given rcu_cpu_starting() ignores the @cpu parameter and only
> > > > > set _this_ cpu's bit in a leaf node?
> > > > 
> > > > The calls from notify_cpu_starting() are called from the various
> > > > start_kernel_secondary(), secondary_start_kernel(), and similarly
> > > > named functions.  These are called on the incoming CPU early in that
> > > > CPU's execution.  The call from rcu_init() is correct until such time
> > > > as more than one CPU can be running at rcu_init() time.  And that
> > > > day might be coming, so please see the untested patch below.
> > > 
> > > Looks better than mine ;-)
> > > 
> > > But do we need to worry that we start rcu on each CPU twice, which may
> > > slow down the boot?
> > 
> > We only start a given CPU once.  The boot CPU at rcu_init() time, and
> > the rest at CPU-hotplug time.  Unless of course a CPU is later taken
> 
> Confused... we call rcu_cpu_starting() in a for_each_online_cpu() loop
> in rcu_init(), so we basically start all online CPUs there after
> applying your patch. And all the rest CPUs will get themselves start
> again at CPU-hotplug time, right?

At rcu_init() time, there is only one online CPU, namely the boot CPU.

Or perhaps your point is that if CPUs come online before rcu_init(), they
might do so via the normal online mechanism.  I don't believe that this
is likely, because the normal online mechanism reaquires the scheduler
be running.  But either way, my hope would be that whoever fires up CPUs
before rcu_init() asks a few questions when they run into bugs.  ;-)

> Besides, without your patch, we started the boot CPU many times in the
> for_each_online_cpu() loop.

That is true.  It is harmless because it just does a group of assignments
repeatedly, and because there is only one CPU and because interrupts
are disabled, this cannot have any effect.  And my fix inadvertently
fixed this issue, didn't it?

So I do need to update the commit log accordingly.  Done!

> Am I missing something subtle?

Given the nature of RCU, the only possible answer I can give to that
question is "probably".  (Hey, you asked!!!)

							Thanx, Paul

> Regards,
> Boqun
> 
> > offline, in which case we start it again when it comes back online.
> > 
> > 							Thanx, Paul
> > 
> > > Regards,
> > > Boqun
> > > 
> > > > 							Thanx, Paul
> > > > 
> > > > ------------------------------------------------------------------------
> > > > 
> > > > commit 1e84402587173d6d4da8645689f0e24c877b3269
> > > > Author: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
> > > > Date:   Tue Dec 20 07:17:58 2016 -0800
> > > > 
> > > >     rcu: Make rcu_cpu_starting() use its "cpu" argument
> > > >     
> > > >     The rcu_cpu_starting() function uses this_cpu_ptr() to locate the
> > > >     incoming CPU's rcu_data structure.  This works for the boot CPU and for
> > > >     all CPUs onlined after rcu_init() executes (during very early boot).
> > > >     Currently, this is the full set of CPUs, so all is well.  But if
> > > >     anyone ever parallelizes boot before rcu_init() time, it will fail.
> > > >     This commit therefore substitutes the rcu_cpu_starting() function's
> > > >     this_cpu_pointer() for per_cpu_ptr(), future-proofing the code and
> > > >     (arguably) improving readability.
> > > >     
> > > >     Reported-by: Boqun Feng <boqun.feng@...il.com>
> > > >     Signed-off-by: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
> > > > 
> > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > index b9d3c0e30935..083cb8a6299c 100644
> > > > --- a/kernel/rcu/tree.c
> > > > +++ b/kernel/rcu/tree.c
> > > > @@ -4017,7 +4017,7 @@ void rcu_cpu_starting(unsigned int cpu)
> > > >  	struct rcu_state *rsp;
> > > >  
> > > >  	for_each_rcu_flavor(rsp) {
> > > > -		rdp = this_cpu_ptr(rsp->rda);
> > > > +		rdp = per_cpu_ptr(rsp->rda, cpu);
> > > >  		rnp = rdp->mynode;
> > > >  		mask = rdp->grpmask;
> > > >  		raw_spin_lock_irqsave_rcu_node(rnp, flags);
> > > > 
> > 
> >