[<prev] [next>] [day] [month] [year] [list]
Message-ID: <CAE4VaGDBAquxbBjuzzyaT1WPR95wiaiHsrEPs-eOP2W+r=fQFg@mail.gmail.com>
Date: Mon, 18 May 2020 16:52:52 +0200
From: Jirka Hladky <jhladky@...hat.com>
To: Hillf Danton <hdanton@...a.com>
Cc: Phil Auld <pauld@...hat.com>,
Mel Gorman <mgorman@...hsingularity.net>,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...nel.org>,
Vincent Guittot <vincent.guittot@...aro.org>,
Juri Lelli <juri.lelli@...hat.com>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>,
Valentin Schneider <valentin.schneider@....com>,
LKML <linux-kernel@...r.kernel.org>,
Douglas Shakshober <dshaks@...hat.com>,
Waiman Long <longman@...hat.com>,
Joe Mario <jmario@...hat.com>, Bill Gray <bgray@...hat.com>
Subject: Re: [PATCH 00/13] Reconcile NUMA balancing decisions with the load
balancer v6
Hi Hillf,
thanks a lot for your patch!
Compared to 5.7 rc4 vanilla, we observe the following:
* Single-tenant jobs show improvement up to 15% for SPECjbb2005 and
up to 100% for NAS in low thread mode. In other words, it fixes all
the problems we have reported in this thread.
* Multitenancy jobs show performance degradation up to 30% for SPECjbb2005
While it fixes problems with single-tenant jobs and with a performance
at low system load, it breaks multi-tenant tasks.
We have compared it against kernel with adjust_numa_imbalance disabled
[1], and both kernels perform at the same level for the single-tenant
jobs, but the proposed patch is bad for the multitenancy mode. The
kernel with adjust_numa_imbalance disabled is a clear winner here.
We would be very interested in what others think about disabling
adjust_numa_imbalance function. The patch is bellow. It would be great
to collect performance results for different scenarios to make sure
the results are objective.
Thanks a lot!
Jirka
[1] Patch to test kernel with adjust_numa_imbalance disabled:
===============================================
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 02f323b..8c43d29 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8907,14 +8907,6 @@ static inline long adjust_numa_imbalance(int
imbalance, int src_nr_running)
{
unsigned int imbalance_min;
- /*
- * Allow a small imbalance based on a simple pair of communicating
- * tasks that remain local when the source domain is almost idle.
- */
- imbalance_min = 2;
- if (src_nr_running <= imbalance_min)
- return 0;
-
return imbalance;
}
===============================================
On Fri, May 8, 2020 at 5:47 AM Hillf Danton <hdanton@...a.com> wrote:
>
>
> On Thu, 7 May 2020 13:49:34 Phil Auld wrote:
> >
> > On Thu, May 07, 2020 at 06:29:44PM +0200 Jirka Hladky wrote:
> > > Hi Mel,
> > >
> > > we are not targeting just OMP applications. We see the performance
> > > degradation also for other workloads, like SPECjbb2005 and
> > > SPECjvm2008. Even worse, it also affects a higher number of threads.
> > > For example, comparing 5.7.0-0.rc2 against 5.6 kernel, on 4 NUMA
> > > server with 2x AMD 7351 CPU, we see performance degradation 22% for 32
> > > threads (the system has 64 CPUs in total). We observe this degradation
> > > only when we run a single SPECjbb binary. When running 4 SPECjbb
> > > binaries in parallel, there is no change in performance between 5.6
> > > and 5.7.
> > >
> > > That's why we are asking for the kernel tunable, which we would add to
> > > the tuned profile. We don't expect users to change this frequently but
> > > rather to set the performance profile once based on the purpose of the
> > > server.
> > >
> > > If you could prepare a patch for us, we would be more than happy to
> > > test it extensively. Based on the results, we can then evaluate if
> > > it's the way to go. Thoughts?
> > >
> >
> > I'm happy to spin up a patch once I'm sure what exactly the tuning would
> > effect. At an initial glance I'm thinking it would be the imbalance_min
> > which is currently hardcoded to 2. But there may be something else...
>
> hrm... try to restore the old behavior by skipping task migrate in favor
> of the local node if there is no imbalance.
>
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -1928,18 +1928,16 @@ static void task_numa_find_cpu(struct ta
> src_running = env->src_stats.nr_running - 1;
> dst_running = env->dst_stats.nr_running + 1;
> imbalance = max(0, dst_running - src_running);
> - imbalance = adjust_numa_imbalance(imbalance, src_running);
> + imbalance = adjust_numa_imbalance(imbalance, src_running +1);
>
> - /* Use idle CPU if there is no imbalance */
> + /* No task migrate without imbalance */
> if (!imbalance) {
> - maymove = true;
> - if (env->dst_stats.idle_cpu >= 0) {
> - env->dst_cpu = env->dst_stats.idle_cpu;
> - task_numa_assign(env, NULL, 0);
> - return;
> - }
> + env->best_cpu = -1;
> + return;
> }
> - } else {
> + }
> +
> + do {
> long src_load, dst_load, load;
> /*
> * If the improvement from just moving env->p direction is better
> @@ -1949,7 +1947,7 @@ static void task_numa_find_cpu(struct ta
> dst_load = env->dst_stats.load + load;
> src_load = env->src_stats.load - load;
> maymove = !load_too_imbalanced(src_load, dst_load, env);
> - }
> + } while (0);
>
> for_each_cpu(cpu, cpumask_of_node(env->dst_nid)) {
> /* Skip this CPU if the source task cannot migrate */
>
>
--
-Jirka
Powered by blists - more mailing lists