[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1540307622.11839.53.camel@amazon.de>
Date:   Tue, 23 Oct 2018 15:13:43 +0000
From:   "Raslan, KarimAllah" <karahmed@...zon.de>
To:     "paulmck@...ux.ibm.com" <paulmck@...ux.ibm.com>
CC:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "mathieu.desnoyers@...icios.com" <mathieu.desnoyers@...icios.com>,
        "rostedt@...dmis.org" <rostedt@...dmis.org>,
        "josh@...htriplett.org" <josh@...htriplett.org>,
        "jiangshanlai@...il.com" <jiangshanlai@...il.com>
Subject: Re: [PATCH] rcu: Benefit from expedited grace period in __wait_rcu_gp
On Fri, 2018-10-19 at 13:21 -0700, Paul E. McKenney wrote:
> On Fri, Oct 19, 2018 at 07:45:51PM +0000, Raslan, KarimAllah wrote:
> > 
> > On Fri, 2018-10-19 at 05:31 -0700, Paul E. McKenney wrote:
> > > 
> > > On Fri, Oct 19, 2018 at 02:49:05AM +0200, KarimAllah Ahmed wrote:
> > > > 
> > > > 
> > > > When expedited grace-period is set, both synchronize_sched
> > > > synchronize_rcu_bh can be optimized to have a significantly lower latency.
> > > > 
> > > > Improve wait_rcu_gp handling to also account for expedited grace-period.
> > > > The downside is that wait_rcu_gp will not wait anymore for all RCU variants
> > > > concurrently when an expedited grace-period is set, however, given the
> > > > improved latency it does not really matter.
> > > > 
> > > > Cc: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
> > > > Cc: Josh Triplett <josh@...htriplett.org>
> > > > Cc: Steven Rostedt <rostedt@...dmis.org>
> > > > Cc: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
> > > > Cc: Lai Jiangshan <jiangshanlai@...il.com>
> > > > Cc: linux-kernel@...r.kernel.org
> > > > Signed-off-by: KarimAllah Ahmed <karahmed@...zon.de>
> > > 
> > > Cute!
> > > 
> > > Unfortunately, there are a few problems with this patch:
> > > 
> > > 1.	I will be eliminating synchronize_rcu_mult() due to the fact that
> > > 	the upcoming RCU flavor consolidation eliminates its sole caller.
> > > 	See 5fc9d4e000b1 ("rcu: Eliminate synchronize_rcu_mult()")
> > > 	in my -rcu tree.  This would of course also eliminate the effects
> > > 	of this patch.
> > 
> > Your patch covers our use-case already, but I still think that the semantics 
> > for wait_rcu_gp is not clear to me.
> > 
> > The problem for us was that sched_cpu_deactivate would call
> > synchronize_rcu_mult which does not check for "expedited" at all. So even
> > though we are already using rcu_expedited sysctl variable, synchronize_rcu_mult 
> > was just ignoring it.
> > 
> > That being said, I indeed overlooked rcu_normal and that it takes precedence 
> > over expedited and I did not notice at all the deadlock you mentioned below!
> > 
> > That can however be easily fixed by also checking for !rcu_gp_is_normal.
> 
> ???
> 
> The aforementioned 5fc9d4e000b1 commit replaces the synchronize_rcu_mult()
> with synchronize_rcu(), which really would be subject to the sysfs
> variables.  Of course, this is not yet in mainline, so it perhaps cannot
> solve your immediate problem, which probably involve older kernels in
> any case.  More on this below...
> 
> > 
> > > 
> > > 2.	The real-time guys' users are not going to be at all happy
> > > 	with the IPIs resulting from the _expedited() API members.
> > > 	Yes, they can boot with rcupdate.rcu_normal=1, but they don't
> > > 	always need that big a hammer, and use of this kernel parameter
> > > 	can slow down boot, hibernation, suspend, network configuration,
> > > 	and much else besides.	We therefore don't want them to have to
> > > 	use rcupdate.rcu_normal=1 unless absolutely necessary.
> > 
> > I might be missing something here. Why would they need to "explicitly" use 
> > rcu_normal? If rcu_expedited is set, would not the expected behavior is to call 
> > into the expedited version?
> > 
> > My patch should only activate *expedited* if and only if it is set.
> 
> You are right, I was confused.  However...
> 
> > 
> > I think I might be misunderstanding the expected behavior 
> > from synchronize_rcu_mult. My understanding is that something like:
> > 
> > synchronize_rcu_mult(call_rcu_sched) and synchronize_rcu() should have an 
> > identical behavior, right?
> 
> You would clearly prefer that it did, and the commit log does seem to
> read that way, but synchronize_rcu_mult() is going away anyway, so there
> isn't a whole lot of point in arguing about what it should have done.
> And the eventual implementation (with 5fc9d4e000b1 or its successor)
> will act as you want.
> 
> > 
> > At least in this commit:
> > 
> > commit d7d34d5e46140 ("sched: Rely on synchronize_rcu_mult() de-duplication")
> > 
> > .. the change clearly gives the impression that they can be used 
> > interchangeably. The problem is that this is not true when you look at the 
> > implementation. One of them (i.e. synchronize_rcu) will respect the
> > expedite_rcu flag set by sysfs while the other (i.e. synchronize_rcu_mult) 
> > simply ignores it.
> > 
> > So my patch is about making sure that both of the variants actually respect 
> > it.
> 
> I am guessing that you need to make an older version of the kernel
> expedite the CPU-hotplug grace periods.  I am also guessing that you can
> carry patches to your kernels.  If so, I suggest the following simpler
> change to sched_cpu_deactivate() in kernel/sched/core.c:
> 
> 	if (rcu_gp_is_expedited()) {
> 		synchronize_sched_expedited();
> 		if (IS_ENABLED(CONFIG_PREEMPT))
> 			synchronize_rcu_expedited();
> 	} else {
> 		synchronize_rcu_mult(call_rcu, call_rcu_sched);
> 	}
> 
> As soon as this patch conflicts due to the synchronize_rcu_mult()
> becoming synchronize_rcu(), you can drop the patch.  And this is the only
> use of synchronize_rcu_mult(), so this approach loses no generality.
> Longer term, this patch might possibly be the backport of 5fc9d4e000b1
> back to v4.14, but at the end of the day this is up to the various
> -stable maintainers.
> 
> Hmmm...  If you are running CONFIG_PREEMPT=n, you can use an even
> simpler replacement for synchronize_rcu_mult():
> 
> 	synchronize_sched_expedited(); /* Bug if CONFIG_PREEMPT=y!!! */
> 
> Would either of those two approaches work for you, or am I still missing
> something?
Sorry, I was not clear. The original commit in your -rcu tree already works for 
us. So that is sorted out, thank you :)
In my previous reply, when I was referring to synchronize_rcu_mult I really
also wanted to also point to __wait_rcu_gp. With the current state in your -rcu 
tree, __wait_rcu_gp does not look whether "expedited" is enabled or not. This
is currently delegated to the callers (e.g. synchronize_rcu and 
synchronize_rcu_bh). So any new direct users of __wait_rcu_gp would also miss 
this check (i.e. just like what happened in synchronize_rcu_mult). If we want 
__wait_rcu_gp to always respect rcu_expedited and rcu_normal flags, we might 
want to pull these checks into __wait_rcu_gp instead and remove them from the 
callers.
> 
> 							Thanx, Paul
> 
> > 
> > > 
> > > 3.	If the real-time guys' users were to have booted with
> > > 	rcupdate.rcu_normal=1, then synchronize_sched_expedited()
> > > 	would invoke _synchronize_rcu_expedited, which would invoke
> > > 	wait_rcu_gp(), which would invoke _wait_rcu_gp() which would
> > > 	invoke __wait_rcu_gp(), which, given your patch, would in turn
> > > 	invoke synchronize_sched_expedited().  This situation could
> > > 	well prevent their systems from meeting their response-time
> > > 	requirements.
> > > 
> > > So I cannot accept this patch nor for that matter any similar patch.
> > > 
> > > But what were you really trying to get done here?  If you were thinking
> > > of adding another synchronize_rcu_mult(), the flavor consolidation will
> > > make that unnecessary in most cases.  If you are trying to speed up
> > > CPU-hotplug operations, I suggest using the rcu_expedited sysctl variable
> > > when taking a CPU offline.  If something else, please let me know what
> > > it is so that we can work out how the problem might best be solved.
> > > 
> > > 							Thanx, Paul
> > > 
> > > > 
> > > > 
> > > > ---
> > > >  kernel/rcu/update.c | 34 ++++++++++++++++++++++++++++------
> > > >  1 file changed, 28 insertions(+), 6 deletions(-)
> > > > 
> > > > diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
> > > > index 68fa19a..44b8817 100644
> > > > --- a/kernel/rcu/update.c
> > > > +++ b/kernel/rcu/update.c
> > > > @@ -392,13 +392,27 @@ void __wait_rcu_gp(bool checktiny, int n, call_rcu_func_t *crcu_array,
> > > >  			might_sleep();
> > > >  			continue;
> > > >  		}
> > > > -		init_rcu_head_on_stack(&rs_array[i].head);
> > > > -		init_completion(&rs_array[i].completion);
> > > > +
> > > >  		for (j = 0; j < i; j++)
> > > >  			if (crcu_array[j] == crcu_array[i])
> > > >  				break;
> > > > -		if (j == i)
> > > > -			(crcu_array[i])(&rs_array[i].head, wakeme_after_rcu);
> > > > +		if (j != i)
> > > > +			continue;
> > > > +
> > > > +		if ((crcu_array[i] == call_rcu_sched ||
> > > > +		     crcu_array[i] == call_rcu_bh)
> > > > +		    && rcu_gp_is_expedited()) {
> > > > +			if (crcu_array[i] == call_rcu_sched)
> > > > +				synchronize_sched_expedited();
> > > > +			else
> > > > +				synchronize_rcu_bh_expedited();
> > > > +
> > > > +			continue;
> > > > +		}
> > > > +
> > > > +		init_rcu_head_on_stack(&rs_array[i].head);
> > > > +		init_completion(&rs_array[i].completion);
> > > > +		(crcu_array[i])(&rs_array[i].head, wakeme_after_rcu);
> > > >  	}
> > > > 
> > > >  	/* Wait for all callbacks to be invoked. */
> > > > @@ -407,11 +421,19 @@ void __wait_rcu_gp(bool checktiny, int n, call_rcu_func_t *crcu_array,
> > > >  		    (crcu_array[i] == call_rcu ||
> > > >  		     crcu_array[i] == call_rcu_bh))
> > > >  			continue;
> > > > +
> > > > +		if ((crcu_array[i] == call_rcu_sched ||
> > > > +		     crcu_array[i] == call_rcu_bh)
> > > > +		    && rcu_gp_is_expedited())
> > > > +			continue;
> > > > +
> > > >  		for (j = 0; j < i; j++)
> > > >  			if (crcu_array[j] == crcu_array[i])
> > > >  				break;
> > > > -		if (j == i)
> > > > -			wait_for_completion(&rs_array[i].completion);
> > > > +		if (j != i)
> > > > +			continue;
> > > > +
> > > > +		wait_for_completion(&rs_array[i].completion);
> > > >  		destroy_rcu_head_on_stack(&rs_array[i].head);
> > > >  	}
> > > >  }
> > > > -- 
> > > > 2.7.4
> > > > 
> > > 
> > Amazon Development Center Germany GmbH
> > Berlin - Dresden - Aachen
> > main office: Krausenstr. 38, 10117 Berlin
> > Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
> > Ust-ID: DE289237879
> > Eingetragen am Amtsgericht Charlottenburg HRB 149173 B
> 
> 
Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B
Powered by blists - more mailing lists
 
