[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20160713230238.GU7094@linux.vnet.ibm.com>
Date: Wed, 13 Jul 2016 16:02:38 -0700
From: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To: John Stultz <john.stultz@...aro.org>
Cc: Tejun Heo <tj@...nel.org>, Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>,
lkml <linux-kernel@...r.kernel.org>,
Dmitry Shmidt <dimitrysh@...gle.com>,
Rom Lemarchand <romlem@...gle.com>,
Colin Cross <ccross@...gle.com>, Todd Kjos <tkjos@...gle.com>,
Oleg Nesterov <oleg@...hat.com>
Subject: Re: Severe performance regression w/ 4.4+ on Android due to cgroup
locking changes
On Wed, Jul 13, 2016 at 03:39:37PM -0700, John Stultz wrote:
> On Wed, Jul 13, 2016 at 3:17 PM, Paul E. McKenney
> <paulmck@...ux.vnet.ibm.com> wrote:
> > On Wed, Jul 13, 2016 at 02:46:37PM -0700, John Stultz wrote:
> >> On Wed, Jul 13, 2016 at 2:42 PM, Paul E. McKenney
> >> <paulmck@...ux.vnet.ibm.com> wrote:
> >> > On Wed, Jul 13, 2016 at 02:18:41PM -0700, Paul E. McKenney wrote:
> >> >> On Wed, Jul 13, 2016 at 05:05:26PM -0400, Tejun Heo wrote:
> >> >> > On Wed, Jul 13, 2016 at 02:03:15PM -0700, Paul E. McKenney wrote:
> >> >> > > Take the patch that I just sent out and make the choice of normal
> >> >> > > vs. expedited depend on CONFIG_PREEMPT_RT or whatever the -rt guys are
> >> >> > > calling it these days. Is there a low-latency Kconfig option other
> >> >> > > than CONFIG_NO_HZ_FULL?
> >> >> >
> >> >> > Sounds like a plan to me.
> >> >>
> >> >> I like the way we like each other's idea. Mutually assured laziness? ;-)
> >> >
> >> > But here is what mine might look like. Untested, probably does
> >> > not even build. Note that the default is -no- expediting, use the
> >> > rcusync.expedited kernel parameter to enable it.
> >>
> >> I was working on something similar, but using a config option. Would
> >> adding a config option for the default make sense here, since I'd
> >> probably prefer to have one less thing to always specify on the
> >> cmdline?
> >
> > As long as you don't mind it depending on CONFIG_RCU_EXPERT, no problem.
> >
> > Perhaps like the following, on top of the previous patch?
> >
> > Or if you are going to put it in defconfig files only, I can make it
> > so that it isn't changeable at menuconfig time.
>
> I think having it discoverable via menuconfig is useful, and I've got
> no objections to it being under RCU_EXPERT
> (assuming I don't badly muck up my RCU settings accidentally :).
But isn't mucking up your RCU settings half of the fun? ;-)
> I only had that one nit about maybe wanting to put something in dmesg
> when we're using the expedited methods.
Updated, please see below.
> But otherwise both patches look great and are working well!
>
> Do you mind marking them both for stable 4.4+?
OK, looks like it does qualify in the "fix a notable performance or
interactivity issue" category.
> Tested-by: John Stultz <john.stultz@...aro.org>
> Acked-by: John Stultz <john.stultz@...aro.org>
>
> Also, do make sure Dmitry gets the reported-by credit for the first patch.
Done! The updated first patch is below, and the second will follow.
Thanx, Paul
------------------------------------------------------------------------
commit 59435eb836ee73b30ed6ada525125b67b4029321
Author: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
Date: Wed Jul 13 14:43:46 2016 -0700
rcu: Provide rcusync.expedited kernel boot parameter
Dmitry Shmidt and John Stultz noticed that __cgroup_procs_write()
sometimes incurred excessive overheads, ranging up into the tens of
milliseconds. Further testing confirmed speculation that this was due
to synchronize_sched() within rcusync being invoked by per-CPU rwsems.
This testing also showed that substituting synchronize_sched_expedited()
for synchronize_sched() greatly reduced the overheads to below 200
microseconds, with the occasional excursion into the low single digits
worth of milliseconds.
This commit therefore provides a rcusync.expedited kernel boot parameter
that causes rcusync to use expedited grace-period primitives.
Reported-by: Dmitry Shmidt <dimitrysh@...gle.com>
Reported-by: John Stultz <john.stultz@...aro.org>
Signed-off-by: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
Tested-by: John Stultz <john.stultz@...aro.org>
Acked-by: John Stultz <john.stultz@...aro.org>
Cc: <stable@...r.kernel.org> # 4.4.x-
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 82b42c958d1c..b8bc9854e548 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -3229,6 +3229,11 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
energy efficiency by requiring that the kthreads
periodically wake up to do the polling.
+ rcusync.expedited [KNL]
+ Specify that the rcusync mechanism use expedited
+ grace periods. As of mid-2016, this affects
+ per-CPU rwsems.
+
rcutree.blimit= [KNL]
Set maximum number of finished RCU callbacks to
process in one batch.
diff --git a/kernel/rcu/sync.c b/kernel/rcu/sync.c
index be922c9f3d37..0d0dc992cce7 100644
--- a/kernel/rcu/sync.c
+++ b/kernel/rcu/sync.c
@@ -22,6 +22,14 @@
#include <linux/rcu_sync.h>
#include <linux/sched.h>
+#include <linux/moduleparam.h>
+#include <linux/module.h>
+
+MODULE_ALIAS("rcusync");
+#ifdef MODULE_PARAM_PREFIX
+#undef MODULE_PARAM_PREFIX
+#endif
+#define MODULE_PARAM_PREFIX "rcusync."
#ifdef CONFIG_PROVE_RCU
#define __INIT_HELD(func) .held = func,
@@ -29,14 +37,14 @@
#define __INIT_HELD(func)
#endif
-static const struct {
+static struct {
void (*sync)(void);
void (*call)(struct rcu_head *, void (*)(struct rcu_head *));
void (*wait)(void);
#ifdef CONFIG_PROVE_RCU
int (*held)(void);
#endif
-} gp_ops[] = {
+} gp_ops[] __read_mostly = {
[RCU_SYNC] = {
.sync = synchronize_rcu,
.call = call_rcu,
@@ -62,6 +70,21 @@ enum { CB_IDLE = 0, CB_PENDING, CB_REPLAY };
#define rss_lock gp_wait.lock
+static bool expedited;
+module_param(expedited, bool, 0444);
+
+static int __init rcu_sync_early_init(void)
+{
+ if (expedited) {
+ pr_info("RCU_SYNC: Expedited operation in effect.\n");
+ gp_ops[RCU_SYNC].sync = synchronize_rcu_expedited;
+ gp_ops[RCU_SCHED_SYNC].sync = synchronize_sched_expedited;
+ gp_ops[RCU_BH_SYNC].sync = synchronize_rcu_bh_expedited;
+ }
+ return 0;
+}
+early_initcall(rcu_sync_early_init);
+
#ifdef CONFIG_PROVE_RCU
void rcu_sync_lockdep_assert(struct rcu_sync *rsp)
{
Powered by blists - more mailing lists