linux-kernel - Re: [PATCH 00/13] Reconcile NUMA balancing decisions with the load balancer v6

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAE4VaGDQWPePtmtCZP=ROYW1KPxtPhGDrxqy2QbirHGJdwk4=w@mail.gmail.com>
Date:   Wed, 13 May 2020 16:57:15 +0200
From:   Jirka Hladky <jhladky@...hat.com>
To:     Mel Gorman <mgorman@...hsingularity.net>
Cc:     Phil Auld <pauld@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>,
        Valentin Schneider <valentin.schneider@....com>,
        Hillf Danton <hdanton@...a.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Douglas Shakshober <dshaks@...hat.com>,
        Waiman Long <longman@...hat.com>,
        Joe Mario <jmario@...hat.com>, Bill Gray <bgray@...hat.com>
Subject: Re: [PATCH 00/13] Reconcile NUMA balancing decisions with the load
 balancer v6

Hi Mel,

we have tried the kernel with adjust_numa_imbalance() crippled to just
return the imbalance it's given.

It has solved all the performance problems I have reported.
Performance is the same as with 5.6 kernel (before the patch was
applied).

* solved the performance drop upto 20%  with single instance
SPECjbb2005 benchmark on 8 NUMA node servers (particularly on AMD EPYC
Rome systems) => this performance drop was INCREASING with higher
threads counts (10% for 16 threads and 20 % for 32 threads)
* solved the performance drop for low load scenarios (SPECjvm2008 and NAS)

Any suggestions on how to proceed? One approach is to turn
"imbalance_min" into the kernel tunable. Any other ideas?

https://github.com/torvalds/linux/blob/4f8a3cc1183c442daee6cc65360e3385021131e4/kernel/sched/fair.c#L8914

Thanks a lot!
Jirka






On Fri, May 8, 2020 at 12:40 PM Jirka Hladky <jhladky@...hat.com> wrote:
>
> Hi Mel,
>
> thanks for hints! We will try it.
>
> @Phil - could you please prepare a kernel build for me to test?
>
> Thank you!
> Jirka
>
> On Fri, May 8, 2020 at 11:22 AM Mel Gorman <mgorman@...hsingularity.net> wrote:
>>
>> On Thu, May 07, 2020 at 06:29:44PM +0200, Jirka Hladky wrote:
>> > Hi Mel,
>> >
>> > we are not targeting just OMP applications. We see the performance
>> > degradation also for other workloads, like SPECjbb2005 and
>> > SPECjvm2008. Even worse, it also affects a higher number of threads.
>> > For example, comparing 5.7.0-0.rc2 against 5.6 kernel, on 4 NUMA
>> > server with 2x AMD 7351 CPU, we see performance degradation 22% for 32
>> > threads (the system has 64 CPUs in total). We observe this degradation
>> > only when we run a single SPECjbb binary. When running 4 SPECjbb
>> > binaries in parallel, there is no change in performance between 5.6
>> > and 5.7.
>> >
>>
>> Minimally I suggest confirming that it's really due to
>> adjust_numa_imbalance() by making the function a no-op and retesting.
>> I have found odd artifacts with it but I'm unsure how to proceed without
>> causing problems elsehwere.
>>
>> For example, netperf on localhost in some cases reported a regression
>> when the client and server were running on the same node. The problem
>> appears to be that netserver completes its work faster when running
>> local and goes idle more regularly. The cost of going idle and waking up
>> builds up and a lower throughput is reported but I'm not sure if gaming
>> an artifact like that is a good idea.
>>
>> > That's why we are asking for the kernel tunable, which we would add to
>> > the tuned profile. We don't expect users to change this frequently but
>> > rather to set the performance profile once based on the purpose of the
>> > server.
>> >
>> > If you could prepare a patch for us, we would be more than happy to
>> > test it extensively. Based on the results, we can then evaluate if
>> > it's the way to go. Thoughts?
>> >
>>
>> I would suggest simply disabling that function first to ensure that is
>> really what is causing problems for you.
>>
>> --
>> Mel Gorman
>> SUSE Labs
>>
>
>
> --
> -Jirka



-- 
-Jirka