[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAE4VaGAxqK_gr7gstk1S8z3vx+9c6rG-Xo_kUiAzuOWpqOR4cQ@mail.gmail.com>
Date: Wed, 20 May 2020 15:58:01 +0200
From: Jirka Hladky <jhladky@...hat.com>
To: Hillf Danton <hdanton@...a.com>
Cc: Phil Auld <pauld@...hat.com>,
Mel Gorman <mgorman@...hsingularity.net>,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...nel.org>,
Vincent Guittot <vincent.guittot@...aro.org>,
Juri Lelli <juri.lelli@...hat.com>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>,
Valentin Schneider <valentin.schneider@....com>,
LKML <linux-kernel@...r.kernel.org>,
Douglas Shakshober <dshaks@...hat.com>,
Waiman Long <longman@...hat.com>,
Joe Mario <jmario@...hat.com>, Bill Gray <bgray@...hat.com>,
"aokuliar@...hat.com" <aokuliar@...hat.com>,
"kkolakow@...hat.com" <kkolakow@...hat.com>
Subject: Re: [PATCH 00/13] Reconcile NUMA balancing decisions with the load
balancer v6
Hi Hillf, Mel and all,
thanks for the patch! It has produced really GOOD results.
1) It has fixed performance problems with 5.7 vanilla kernel for
single-tenant workload and low system load scenarios, without
performance degradation for the multi-tenant tasks. It's producing the
same results as the previous proof-of-concept patch where
adjust_numa_imbalance function was modified to be a no-op (returning
the same value of imbalance as it gets on the input).
2) We have also added Mel's netperf-cstate-small-cross-socket test to
our test battery:
https://github.com/gormanm/mmtests/blob/master/configs/config-network-netperf-cstate-small-cross-socket
Mel told me that he had seen significant performance improvements with
5.7 over 5.6 for the netperf-cstate-small-cross-socket scenario.
Out of 6 different patches we have tested, your patch has performed
the best for this scenario. Compared to vanilla, we see minimal
performance degradation of 2.5% for the udp stream throughput and 0.6%
for the tcp throughput. The testing was done on a dual-socket system
with Gold 6132 CPU.
@Mel - could you please test Hillf's patch with your full testing
suite? So far, it looks very promising, but I would like to check the
patch thoroughly to make sure it does not hurt performance in other
areas.
Thanks a lot!
Jirka
On Tue, May 19, 2020 at 6:32 AM Hillf Danton <hdanton@...a.com> wrote:
>
>
> Hi Jirka
>
> On Mon, 18 May 2020 16:52:52 +0200 Jirka Hladky wrote:
> >
> > We have compared it against kernel with adjust_numa_imbalance disabled
> > [1], and both kernels perform at the same level for the single-tenant
> > jobs, but the proposed patch is bad for the multitenancy mode. The
> > kernel with adjust_numa_imbalance disabled is a clear winner here.
>
> Double thanks to you for the tests!
>
> > We would be very interested in what others think about disabling
> > adjust_numa_imbalance function. The patch is bellow. It would be great
>
> A minute...
>
> > to collect performance results for different scenarios to make sure
> > the results are objective.
>
> I don't have another test case but a diff trying to confine the tool
> in question back to the hard-coded 2's field.
>
> It's used in the first hunk below to detect imbalance before migrating
> a task, and a small churn of code is added at another call site when
> balancing idle CPUs.
>
> Thanks
> Hillf
>
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -1916,20 +1916,26 @@ static void task_numa_find_cpu(struct ta
> * imbalance that would be overruled by the load balancer.
> */
> if (env->dst_stats.node_type == node_has_spare) {
> - unsigned int imbalance;
> - int src_running, dst_running;
> + unsigned int imbalance = 2;
>
> - /*
> - * Would movement cause an imbalance? Note that if src has
> - * more running tasks that the imbalance is ignored as the
> - * move improves the imbalance from the perspective of the
> - * CPU load balancer.
> - * */
> - src_running = env->src_stats.nr_running - 1;
> - dst_running = env->dst_stats.nr_running + 1;
> - imbalance = max(0, dst_running - src_running);
> - imbalance = adjust_numa_imbalance(imbalance, src_running);
> + //No imbalance computed without spare capacity
> + if (env->dst_stats.node_type != env->src_stats.node_type)
> + goto check_imb;
> +
> + imbalance = adjust_numa_imbalance(imbalance,
> + env->src_stats.nr_running);
> +
> + //Do nothing without imbalance
> + if (!imbalance) {
> + imbalance = 2;
> + goto check_imb;
> + }
> +
> + //Migrate task if it's likely to grow balance
> + if (env->dst_stats.nr_running + 1 < env->src_stats.nr_running)
> + imbalance = 0;
>
> +check_imb:
> /* Use idle CPU if there is no imbalance */
> if (!imbalance) {
> maymove = true;
> @@ -9011,12 +9017,13 @@ static inline void calculate_imbalance(s
> env->migration_type = migrate_task;
> env->imbalance = max_t(long, 0, (local->idle_cpus -
> busiest->idle_cpus) >> 1);
> - }
>
> - /* Consider allowing a small imbalance between NUMA groups */
> - if (env->sd->flags & SD_NUMA)
> - env->imbalance = adjust_numa_imbalance(env->imbalance,
> - busiest->sum_nr_running);
> + /* Consider allowing a small imbalance between NUMA groups */
> + if (env->sd->flags & SD_NUMA &&
> + local->group_type == busiest->group_type)
> + env->imbalance = adjust_numa_imbalance(env->imbalance,
> + busiest->sum_nr_running);
> + }
>
> return;
> }
> --
>
--
-Jirka
Powered by blists - more mailing lists