linux-kernel - Re: [RFC v2 4/5] rcu: Use for_each_leaf_node_cpu() in force_qs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 21 Dec 2016 10:34:56 +0800
From:   Boqun Feng <boqun.feng@...il.com>
To:     "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Cc:     Colin Ian King <colin.king@...onical.com>,
        Mark Rutland <mark.rutland@....com>,
        linux-kernel@...r.kernel.org,
        Josh Triplett <josh@...htriplett.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Lai Jiangshan <jiangshanlai@...il.com>
Subject: Re: [RFC v2 4/5] rcu: Use for_each_leaf_node_cpu() in force_qs_rnp()

On Tue, Dec 20, 2016 at 07:23:52AM -0800, Paul E. McKenney wrote:
> On Tue, Dec 20, 2016 at 01:59:14PM +0800, Boqun Feng wrote:
> > On Mon, Dec 19, 2016 at 09:09:13PM -0800, Paul E. McKenney wrote:
> > > On Mon, Dec 19, 2016 at 11:15:15PM +0800, Boqun Feng wrote:
> > > > On Thu, Dec 15, 2016 at 02:51:36PM +0000, Colin Ian King wrote:
> > > > > On 15/12/16 14:42, Boqun Feng wrote:
> > > > > > On Thu, Dec 15, 2016 at 12:04:59PM +0000, Mark Rutland wrote:
> > > > > >> On Thu, Dec 15, 2016 at 10:42:03AM +0800, Boqun Feng wrote:
> > > > > >>> ->qsmask of an RCU leaf node is usually more sparse than the
> > > > > >>> corresponding cpu_possible_mask. So replace the
> > > > > >>> for_each_leaf_node_possible_cpu() in force_qs_rnp() with
> > > > > >>> for_each_leaf_node_cpu() to save several checks.
> > > > > >>>
> > > > > >>> [Note we need to use "1UL << bit" instead of "1 << bit" to generate the
> > > > > >>> corresponding mask for a bit because @mask is unsigned long, this was
> > > > > >>> spotted by Colin Ian King <colin.king@...onical.com> and CoverityScan in
> > > > > >>> a previous version of this patch.]
> > > > > >>
> > > > > >> Nit: This note can go now that we use leaf_node_cpu_bit(). ;)
> > > > > >>
> > > > > > 
> > > > > > I kinda keep this here for honoring the effort of finding out this bug
> > > > > > from Colin, but yes, it's no longer needed here for the current code.
> > > > > 
> > > > > Yep, remove it.
> > > > > 
> > > > 
> > > > Paul, here is a modified version of this patch, what I only did is
> > > > removing this note.
> > > > 
> > > > Besides I rebased the whole series on the current rcu/dev branch of -rcu
> > > > tree, on this very commit:
> > > > 
> > > > 	8e9b2521b18a ("doc: Quick-Quiz answers are now inline")
> > > > 
> > > > And I put the latest version at
> > > > 
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/boqun/linux.git leaf-node
> > > > 
> > > > If you thought it's better, I could send a v3 ;-)
> > > 
> > > I would feel better about this patchset if it reduced the number of lines
> > > of code rather than increasing them.  That said, part of the increase
> > > is a commment.  Still, I am not convinced that the extra level of macro
> > > is carrying its weight.
> > > 
> > > dbf18a2422e2 ("rcu: Introduce for_each_leaf_node_cpu()")
> > > 
> > > 	The commit log needs a bit of wordsmithing.
> > > 
> > > 	The added WARN_ON_ONCE(!cpu_possible(cpu)) still seems strange.
> > > 	What is its purpose, really?  What does its triggering tell you?
> > > 	What other checks did you consider as an alternative?
> > 
> > The check is an over-case one, it's introduced because I'm worried about
> > some code outside the RCU code mis-sets the ->qsmask* or ->expmask* on
> > an "impossible" CPU. I will explanation later in more details.
> 
> Over-case check?
> 

Oops, sorry for the typo, should be "over-care check".

> > > 	And if you are going to add checks of this type, should you
> > > 	also check for this being a leaf rcu_node structure?
> > 
> > I don't think I want to check that, and I don't think check
> > cpu_possible(cpu) in the macro is similar to that.
> 
> If we are adding checks, they should be catching bugs.  This is of
> course a trade-off -- too many checks makes the code less readable
> and makes it more difficult to change the code.  Too few checks
> makes bugs harder to pin down.
> 
> At this point, I don't really see the need for either check.  ;-)
> 

Agreed, my intent is to keep this overcare check for couples of releases
and if no one shoots his/her foot, we can remove it, if not, it
definitely means this part is subtle, and we need to pay more attention
to it, maybe write some regression tests for this particular problem to
help developers avoid it.

This check is supposed to be removed, so I'm not stick to keeping it.

> > > 3f0b4ba1fe94 ("rcu: Use for_each_leaf_node_cpu() in RCU stall checking")
> > > 
> > > 	This does look a bit nicer, but why the added blank lines?
> > > 	Are they really helping?
> > > 
> > > 	The commit log seems a bit misplaced.  This code is almost never
> > > 	executed (once per 21 seconds at the most), so performance really
> > > 	isn't a consideration.	The simpler-looking code might be.
> > > 
> > > fd799f1ac7b7 ("rcu: Use for_each_leaf_node_cpu() in ->expmask iteration")
> > > 
> > > 	Ditto on blank lines.
> > > 
> > > 	Again, this code is executed per expedited grace period, so
> > > 	performance really isn't a big deal.  More of a big deal than
> > > 	the stall-warning code, but we still are way off of any fastpath.
> > > 
> > > 69a1baedbf42 ("rcu: Use for_each_leaf_node_cpu() in force_qs_rnp()")
> > > 
> > > 	Ditto again on blank lines.
> > > 
> > > 	And on the commit log.  This code is executed about once
> > > 	per several jiffies, and on larger machines, per 20 jiffies
> > > 	or so.  Performance really isn't a consideration.
> > > 
> > > 7b00e50e3efb ("rcu: Use for_each_leaf_node_cpu() in online CPU iteration")
> > > 
> > > 	And another ditto on blank lines.
> > > 
> > > 	This code executes once per CPU-hotplug operation, so again isn't
> > > 	at all performance critical.
> > > 
> > > In short, if you are trying to sell this to me as a significant performance
> > > boost, I am not buying.  The added WARN_ON_ONCE() looks quite dubious,
> > 
> > Yep, it won't help the performance a lot, but it 
> > 
> > 1)	helps the performance in theory, because it iterates less CPUs
> > 
> > 2)	makes code cleaner. By "cleaner", I mean we can a) affort more
> > 	blank lines to make loops separated from other code and b)
> > 	descrease the indent levels for those loops. But, yes I should
> > 	add those points in the commit log, because those are more
> > 	visible effects.
> 
> #2 is the more important of the two, though you still have not
> convinced me that those particular blank lines are helping.  Making

TBH, blank lines really help me ;-) So my habit is more like adding
blank lines between logical parts like if-else, for, while as more as
possible.

> these functions longer isn't necessarily a good thing.
> 

But you are right, I probably shouldn't introduce more blank lines in
this kind of patchset, after all, this patchset did want to clean the
code a little bit, but adding more blank lines seems like we are making
things more complicated so we need those blank lines to separate logical
parts. So we don't want those blank lines.

> > > though perhaps I am misunderstanding its purpose.  My assumption is
> > > that you want to detect missing UL suffixes on bitmask constants, in
> > > which case I bet there is a better way.
> > 
> > The WARN_ON_ONCE() is not for detecting missing UL suffixes on bitmask
> > constatns, and we don't need to check that, because we use
> > leaf_node_cpu_id() now. As I said, this is an over-case check, and we
> > can drop if we guarante that CPUs masked in ->qsmask* and ->expmask*
> > must be a "possible" CPU, IOW, ->qsmask* and ->expmask* are the subsets
> > (with offset fixed by ->grplo) of cpu_possible_mask.
> 
> In which case it is not needed.
> 
> > Hmm.. and I just check the code, the initial values of ->qsmask* and
> > ->expmask* are from ->qsmaskinitnext and ->expmaskinitnext, and the
> > latter two are set in rcu_cpu_starting() since commit
> > 
> > 	7ec99de36f40 ("rcu: Provide exact CPU-online tracking for RCU")
> > 
> > , and rcu_cpu_starting() only set the corresponding bit of _this_ cpu in
> > a leaf node's ->qsmaskinitnext and ->expmaskinitnext. So looks like we
> > are safe to remove the WARN_ON_ONCE() check, because a ever-running CPU
> > must be a possible CPU, IIRC.
> > 
> > But this brings a side question, is the callsite of rcu_cpu_starting()
> > is correct? Given rcu_cpu_starting() ignores the @cpu parameter and only
> > set _this_ cpu's bit in a leaf node?
> 
> The calls from notify_cpu_starting() are called from the various
> start_kernel_secondary(), secondary_start_kernel(), and similarly
> named functions.  These are called on the incoming CPU early in that
> CPU's execution.  The call from rcu_init() is correct until such time
> as more than one CPU can be running at rcu_init() time.  And that
> day might be coming, so please see the untested patch below.
> 

Looks better than mine ;-)

But do we need to worry that we start rcu on each CPU twice, which may
slow down the boot?

Regards,
Boqun

> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> commit 1e84402587173d6d4da8645689f0e24c877b3269
> Author: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
> Date:   Tue Dec 20 07:17:58 2016 -0800
> 
>     rcu: Make rcu_cpu_starting() use its "cpu" argument
>     
>     The rcu_cpu_starting() function uses this_cpu_ptr() to locate the
>     incoming CPU's rcu_data structure.  This works for the boot CPU and for
>     all CPUs onlined after rcu_init() executes (during very early boot).
>     Currently, this is the full set of CPUs, so all is well.  But if
>     anyone ever parallelizes boot before rcu_init() time, it will fail.
>     This commit therefore substitutes the rcu_cpu_starting() function's
>     this_cpu_pointer() for per_cpu_ptr(), future-proofing the code and
>     (arguably) improving readability.
>     
>     Reported-by: Boqun Feng <boqun.feng@...il.com>
>     Signed-off-by: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
> 
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index b9d3c0e30935..083cb8a6299c 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -4017,7 +4017,7 @@ void rcu_cpu_starting(unsigned int cpu)
>  	struct rcu_state *rsp;
>  
>  	for_each_rcu_flavor(rsp) {
> -		rdp = this_cpu_ptr(rsp->rda);
> +		rdp = per_cpu_ptr(rsp->rda, cpu);
>  		rnp = rdp->mynode;
>  		mask = rdp->grpmask;
>  		raw_spin_lock_irqsave_rcu_node(rnp, flags);
> 

Download attachment "signature.asc" of type "application/pgp-signature" (489 bytes)