linux-kernel - Re: [PATCH] rcu: Benefit from expedited grace period in __wait_rcu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Wed, 24 Oct 2018 04:20:01 -0700
From:   "Paul E. McKenney" <paulmck@...ux.ibm.com>
To:     "Raslan, KarimAllah" <karahmed@...zon.de>
Cc:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "mathieu.desnoyers@...icios.com" <mathieu.desnoyers@...icios.com>,
        "rostedt@...dmis.org" <rostedt@...dmis.org>,
        "josh@...htriplett.org" <josh@...htriplett.org>,
        "jiangshanlai@...il.com" <jiangshanlai@...il.com>
Subject: Re: [PATCH] rcu: Benefit from expedited grace period in __wait_rcu_gp

On Tue, Oct 23, 2018 at 03:13:43PM +0000, Raslan, KarimAllah wrote:
> On Fri, 2018-10-19 at 13:21 -0700, Paul E. McKenney wrote:
> > On Fri, Oct 19, 2018 at 07:45:51PM +0000, Raslan, KarimAllah wrote:
> > > 
> > > On Fri, 2018-10-19 at 05:31 -0700, Paul E. McKenney wrote:
> > > > 
> > > > On Fri, Oct 19, 2018 at 02:49:05AM +0200, KarimAllah Ahmed wrote:
> > > > > 
> > > > > 
> > > > > When expedited grace-period is set, both synchronize_sched
> > > > > synchronize_rcu_bh can be optimized to have a significantly lower latency.
> > > > > 
> > > > > Improve wait_rcu_gp handling to also account for expedited grace-period.
> > > > > The downside is that wait_rcu_gp will not wait anymore for all RCU variants
> > > > > concurrently when an expedited grace-period is set, however, given the
> > > > > improved latency it does not really matter.
> > > > > 
> > > > > Cc: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
> > > > > Cc: Josh Triplett <josh@...htriplett.org>
> > > > > Cc: Steven Rostedt <rostedt@...dmis.org>
> > > > > Cc: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
> > > > > Cc: Lai Jiangshan <jiangshanlai@...il.com>
> > > > > Cc: linux-kernel@...r.kernel.org
> > > > > Signed-off-by: KarimAllah Ahmed <karahmed@...zon.de>
> > > > 
> > > > Cute!
> > > > 
> > > > Unfortunately, there are a few problems with this patch:
> > > > 
> > > > 1.	I will be eliminating synchronize_rcu_mult() due to the fact that
> > > > 	the upcoming RCU flavor consolidation eliminates its sole caller.
> > > > 	See 5fc9d4e000b1 ("rcu: Eliminate synchronize_rcu_mult()")
> > > > 	in my -rcu tree.  This would of course also eliminate the effects
> > > > 	of this patch.
> > > 
> > > Your patch covers our use-case already, but I still think that the semantics 
> > > for wait_rcu_gp is not clear to me.
> > > 
> > > The problem for us was that sched_cpu_deactivate would call
> > > synchronize_rcu_mult which does not check for "expedited" at all. So even
> > > though we are already using rcu_expedited sysctl variable, synchronize_rcu_mult 
> > > was just ignoring it.
> > > 
> > > That being said, I indeed overlooked rcu_normal and that it takes precedence 
> > > over expedited and I did not notice at all the deadlock you mentioned below!
> > > 
> > > That can however be easily fixed by also checking for !rcu_gp_is_normal.
> > 
> > ???
> > 
> > The aforementioned 5fc9d4e000b1 commit replaces the synchronize_rcu_mult()
> > with synchronize_rcu(), which really would be subject to the sysfs
> > variables.  Of course, this is not yet in mainline, so it perhaps cannot
> > solve your immediate problem, which probably involve older kernels in
> > any case.  More on this below...
> > 
> > > 
> > > > 
> > > > 2.	The real-time guys' users are not going to be at all happy
> > > > 	with the IPIs resulting from the _expedited() API members.
> > > > 	Yes, they can boot with rcupdate.rcu_normal=1, but they don't
> > > > 	always need that big a hammer, and use of this kernel parameter
> > > > 	can slow down boot, hibernation, suspend, network configuration,
> > > > 	and much else besides.	We therefore don't want them to have to
> > > > 	use rcupdate.rcu_normal=1 unless absolutely necessary.
> > > 
> > > I might be missing something here. Why would they need to "explicitly" use 
> > > rcu_normal? If rcu_expedited is set, would not the expected behavior is to call 
> > > into the expedited version?
> > > 
> > > My patch should only activate *expedited* if and only if it is set.
> > 
> > You are right, I was confused.  However...
> > 
> > > 
> > > I think I might be misunderstanding the expected behavior 
> > > from synchronize_rcu_mult. My understanding is that something like:
> > > 
> > > synchronize_rcu_mult(call_rcu_sched) and synchronize_rcu() should have an 
> > > identical behavior, right?
> > 
> > You would clearly prefer that it did, and the commit log does seem to
> > read that way, but synchronize_rcu_mult() is going away anyway, so there
> > isn't a whole lot of point in arguing about what it should have done.
> > And the eventual implementation (with 5fc9d4e000b1 or its successor)
> > will act as you want.
> > 
> > > 
> > > At least in this commit:
> > > 
> > > commit d7d34d5e46140 ("sched: Rely on synchronize_rcu_mult() de-duplication")
> > > 
> > > .. the change clearly gives the impression that they can be used 
> > > interchangeably. The problem is that this is not true when you look at the 
> > > implementation. One of them (i.e. synchronize_rcu) will respect the
> > > expedite_rcu flag set by sysfs while the other (i.e. synchronize_rcu_mult) 
> > > simply ignores it.
> > > 
> > > So my patch is about making sure that both of the variants actually respect 
> > > it.
> > 
> > I am guessing that you need to make an older version of the kernel
> > expedite the CPU-hotplug grace periods.  I am also guessing that you can
> > carry patches to your kernels.  If so, I suggest the following simpler
> > change to sched_cpu_deactivate() in kernel/sched/core.c:
> > 
> > 	if (rcu_gp_is_expedited()) {
> > 		synchronize_sched_expedited();
> > 		if (IS_ENABLED(CONFIG_PREEMPT))
> > 			synchronize_rcu_expedited();
> > 	} else {
> > 		synchronize_rcu_mult(call_rcu, call_rcu_sched);
> > 	}
> > 
> > As soon as this patch conflicts due to the synchronize_rcu_mult()
> > becoming synchronize_rcu(), you can drop the patch.  And this is the only
> > use of synchronize_rcu_mult(), so this approach loses no generality.
> > Longer term, this patch might possibly be the backport of 5fc9d4e000b1
> > back to v4.14, but at the end of the day this is up to the various
> > -stable maintainers.
> > 
> > Hmmm...  If you are running CONFIG_PREEMPT=n, you can use an even
> > simpler replacement for synchronize_rcu_mult():
> > 
> > 	synchronize_sched_expedited(); /* Bug if CONFIG_PREEMPT=y!!! */
> > 
> > Would either of those two approaches work for you, or am I still missing
> > something?
> 
> Sorry, I was not clear. The original commit in your -rcu tree already works for 
> us. So that is sorted out, thank you :)

Very good, thank you!

Just to confirm:

1.	By "original commit", you mean 5fc9d4e000b1 ("rcu: Eliminate
	synchronize_rcu_mult()"), right?

2.	Please note that 5fc9d4e000b1 can only be backported to the
	upcoming v4.20 release, and given that it is only an uncommon-case
	performance regression, I would not submit it to -stable at all.
	So if you need this performance regression to work in v4.19 or
	earlier, please:

	a.	Let me know.

	b.	Test this replacement for the current call to
		synchronize_rcu_mult() from sched_cpu_deactivate()
		in kernel/sched/core.c:

		if (rcu_gp_is_expedited()) {
			synchronize_sched_expedited();
			if (IS_ENABLED(CONFIG_PREEMPT))
				synchronize_rcu_expedited();
		} else {
			synchronize_rcu_mult(call_rcu, call_rcu_sched);
		}

> In my previous reply, when I was referring to synchronize_rcu_mult I really
> also wanted to also point to __wait_rcu_gp. With the current state in your -rcu 
> tree, __wait_rcu_gp does not look whether "expedited" is enabled or not. This
> is currently delegated to the callers (e.g. synchronize_rcu and 
> synchronize_rcu_bh). So any new direct users of __wait_rcu_gp would also miss 
> this check (i.e. just like what happened in synchronize_rcu_mult). If we want 
> __wait_rcu_gp to always respect rcu_expedited and rcu_normal flags, we might 
> want to pull these checks into __wait_rcu_gp instead and remove them from the 
> callers.

Yes, __wait_rcu_gp() does arguably have odd semantics.  It is after all
used to implement the rcu_barrier() call in Tiny RCU and waits for a
normal (not expedited!) grace period in Tree RCU.  But there are many
other places in RCU where the semantics are quite a bit more unusual.

So welcome to my world!  ;-)

							Thanx, Paul

> > > > 3.	If the real-time guys' users were to have booted with
> > > > 	rcupdate.rcu_normal=1, then synchronize_sched_expedited()
> > > > 	would invoke _synchronize_rcu_expedited, which would invoke
> > > > 	wait_rcu_gp(), which would invoke _wait_rcu_gp() which would
> > > > 	invoke __wait_rcu_gp(), which, given your patch, would in turn
> > > > 	invoke synchronize_sched_expedited().  This situation could
> > > > 	well prevent their systems from meeting their response-time
> > > > 	requirements.
> > > > 
> > > > So I cannot accept this patch nor for that matter any similar patch.
> > > > 
> > > > But what were you really trying to get done here?  If you were thinking
> > > > of adding another synchronize_rcu_mult(), the flavor consolidation will
> > > > make that unnecessary in most cases.  If you are trying to speed up
> > > > CPU-hotplug operations, I suggest using the rcu_expedited sysctl variable
> > > > when taking a CPU offline.  If something else, please let me know what
> > > > it is so that we can work out how the problem might best be solved.
> > > > 
> > > > 							Thanx, Paul
> > > > 
> > > > > 
> > > > > 
> > > > > ---
> > > > >  kernel/rcu/update.c | 34 ++++++++++++++++++++++++++++------
> > > > >  1 file changed, 28 insertions(+), 6 deletions(-)
> > > > > 
> > > > > diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
> > > > > index 68fa19a..44b8817 100644
> > > > > --- a/kernel/rcu/update.c
> > > > > +++ b/kernel/rcu/update.c
> > > > > @@ -392,13 +392,27 @@ void __wait_rcu_gp(bool checktiny, int n, call_rcu_func_t *crcu_array,
> > > > >  			might_sleep();
> > > > >  			continue;
> > > > >  		}
> > > > > -		init_rcu_head_on_stack(&rs_array[i].head);
> > > > > -		init_completion(&rs_array[i].completion);
> > > > > +
> > > > >  		for (j = 0; j < i; j++)
> > > > >  			if (crcu_array[j] == crcu_array[i])
> > > > >  				break;
> > > > > -		if (j == i)
> > > > > -			(crcu_array[i])(&rs_array[i].head, wakeme_after_rcu);
> > > > > +		if (j != i)
> > > > > +			continue;
> > > > > +
> > > > > +		if ((crcu_array[i] == call_rcu_sched ||
> > > > > +		     crcu_array[i] == call_rcu_bh)
> > > > > +		    && rcu_gp_is_expedited()) {
> > > > > +			if (crcu_array[i] == call_rcu_sched)
> > > > > +				synchronize_sched_expedited();
> > > > > +			else
> > > > > +				synchronize_rcu_bh_expedited();
> > > > > +
> > > > > +			continue;
> > > > > +		}
> > > > > +
> > > > > +		init_rcu_head_on_stack(&rs_array[i].head);
> > > > > +		init_completion(&rs_array[i].completion);
> > > > > +		(crcu_array[i])(&rs_array[i].head, wakeme_after_rcu);
> > > > >  	}
> > > > > 
> > > > >  	/* Wait for all callbacks to be invoked. */
> > > > > @@ -407,11 +421,19 @@ void __wait_rcu_gp(bool checktiny, int n, call_rcu_func_t *crcu_array,
> > > > >  		    (crcu_array[i] == call_rcu ||
> > > > >  		     crcu_array[i] == call_rcu_bh))
> > > > >  			continue;
> > > > > +
> > > > > +		if ((crcu_array[i] == call_rcu_sched ||
> > > > > +		     crcu_array[i] == call_rcu_bh)
> > > > > +		    && rcu_gp_is_expedited())
> > > > > +			continue;
> > > > > +
> > > > >  		for (j = 0; j < i; j++)
> > > > >  			if (crcu_array[j] == crcu_array[i])
> > > > >  				break;
> > > > > -		if (j == i)
> > > > > -			wait_for_completion(&rs_array[i].completion);
> > > > > +		if (j != i)
> > > > > +			continue;
> > > > > +
> > > > > +		wait_for_completion(&rs_array[i].completion);
> > > > >  		destroy_rcu_head_on_stack(&rs_array[i].head);
> > > > >  	}
> > > > >  }
> > > > > -- 
> > > > > 2.7.4
> > > > > 
> > > > 
> > > Amazon Development Center Germany GmbH
> > > Berlin - Dresden - Aachen
> > > main office: Krausenstr. 38, 10117 Berlin
> > > Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
> > > Ust-ID: DE289237879
> > > Eingetragen am Amtsgericht Charlottenburg HRB 149173 B
> > 
> > 
> Amazon Development Center Germany GmbH
> Berlin - Dresden - Aachen
> main office: Krausenstr. 38, 10117 Berlin
> Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
> Ust-ID: DE289237879
> Eingetragen am Amtsgericht Charlottenburg HRB 149173 B