linux-kernel - Re: [PATCH 00/13] Reconcile NUMA balancing decisions with the load balancer v6

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200507174934.GD19331@lorien.usersys.redhat.com>
Date:   Thu, 7 May 2020 13:49:34 -0400
From:   Phil Auld <pauld@...hat.com>
To:     Jirka Hladky <jhladky@...hat.com>
Cc:     Mel Gorman <mgorman@...hsingularity.net>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>,
        Valentin Schneider <valentin.schneider@....com>,
        Hillf Danton <hdanton@...a.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Douglas Shakshober <dshaks@...hat.com>,
        Waiman Long <longman@...hat.com>,
        Joe Mario <jmario@...hat.com>, Bill Gray <bgray@...hat.com>
Subject: Re: [PATCH 00/13] Reconcile NUMA balancing decisions with the load
 balancer v6

On Thu, May 07, 2020 at 06:29:44PM +0200 Jirka Hladky wrote:
> Hi Mel,
> 
> we are not targeting just OMP applications. We see the performance
> degradation also for other workloads, like SPECjbb2005 and
> SPECjvm2008. Even worse, it also affects a higher number of threads.
> For example, comparing 5.7.0-0.rc2 against 5.6 kernel, on 4 NUMA
> server with 2x AMD 7351 CPU, we see performance degradation 22% for 32
> threads (the system has 64 CPUs in total). We observe this degradation
> only when we run a single SPECjbb binary. When running 4 SPECjbb
> binaries in parallel, there is no change in performance between 5.6
> and 5.7.
> 
> That's why we are asking for the kernel tunable, which we would add to
> the tuned profile. We don't expect users to change this frequently but
> rather to set the performance profile once based on the purpose of the
> server.
> 
> If you could prepare a patch for us, we would be more than happy to
> test it extensively. Based on the results, we can then evaluate if
> it's the way to go. Thoughts?
>

I'm happy to spin up a patch once I'm sure what exactly the tuning would
effect. At an initial glance I'm thinking it would be the imbalance_min
which is currently hardcoded to 2. But there may be something else...


Cheers,
Phil


> Thanks a lot!
> Jirka
> 
> On Thu, May 7, 2020 at 5:54 PM Mel Gorman <mgorman@...hsingularity.net> wrote:
> >
> > On Thu, May 07, 2020 at 05:24:17PM +0200, Jirka Hladky wrote:
> > > Hi Mel,
> > >
> > > > > Yes, it's indeed OMP.  With low threads count, I mean up to 2x number of
> > > > > NUMA nodes (8 threads on 4 NUMA node servers, 16 threads on 8 NUMA node
> > > > > servers).
> > > >
> > > > Ok, so we know it's within the imbalance threshold where a NUMA node can
> > > > be left idle.
> > >
> > > we have discussed today with my colleagues the performance drop for
> > > some workloads for low threads counts (roughly up to 2x number of NUMA
> > > nodes). We are worried that it can be a severe issue for some use
> > > cases, which require a full memory bandwidth even when only part of
> > > CPUs is used.
> > >
> > > We understand that scheduler cannot distinguish this type of workload
> > > from others automatically. However, there was an idea for a * new
> > > kernel tunable to control the imbalance threshold *. Based on the
> > > purpose of the server, users could set this tunable. See the tuned
> > > project, which allows creating performance profiles [1].
> > >
> >
> > I'm not completely opposed to it but given that the setting is global,
> > I imagine it could have other consequences if two applications ran
> > at different times have different requirements. Given that it's OMP,
> > I would have imagined that an application that really cared about this
> > would specify what was needed using OMP_PLACES. Why would someone prefer
> > kernel tuning or a tuned profile over OMP_PLACES? After all, it requires
> > specific knowledge of the application even to know that a particular
> > tuned profile is needed.
> >
> > --
> > Mel Gorman
> > SUSE Labs
> >
> 
> 
> -- 
> -Jirka
> 

--