linux-kernel - Re: [PATCH 00/13] Reconcile NUMA balancing decisions with the load balancer v3

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKfTPtBfV1QGi2utnmnR21MapKw1g2mTFA_aRxOxXvpWTRX+wA@mail.gmail.com>
Date:   Mon, 17 Feb 2020 14:49:11 +0100
From:   Vincent Guittot <vincent.guittot@...aro.org>
To:     Mel Gorman <mgorman@...hsingularity.net>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>,
        Valentin Schneider <valentin.schneider@....com>,
        Phil Auld <pauld@...hat.com>, Hillf Danton <hdanton@...a.com>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 00/13] Reconcile NUMA balancing decisions with the load
 balancer v3

On Mon, 17 Feb 2020 at 11:44, Mel Gorman <mgorman@...hsingularity.net> wrote:
>
> Changelog since V2:
> o Rebase on top of Vincent's series again
> o Fix a missed rcu_read_unlock
> o Reduce overhead of tracepoint
>
> Changelog since V1:
> o Rebase on top of Vincent's series and rework
>
> Note: The baseline for this series is tip/sched/core as of February
>         12th rebased on top of v5.6-rc1. The series includes patches from
>         Vincent as I needed to add a fix and build on top of it. Vincent's
>         series on its own introduces performance regressions for *some*
>         but not *all* machines so it's easily missed. This series overall
>         is close to performance-neutral with some gains depending on the
>         machine. However, the end result does less work on NUMA balancing
>         and the fact that both the NUMA balancer and load balancer uses
>         similar logic makes it much easier to understand.
>
> The NUMA balancer makes placement decisions on tasks that partially
> take the load balancer into account and vice versa but there are
> inconsistencies. This can result in placement decisions that override
> each other leading to unnecessary migrations -- both task placement
> and page placement. This series reconciles many of the decisions --
> partially Vincent's work with some fixes and optimisations on top to
> merge our two series.
>
> The first patch is unrelated. It's picked up by tip but was not present in
> the tree at the time of the fork. I'm including it here because I tested
> with it.
>
> The second and third patches are tracing only and was needed to get
> sensible data out of ftrace with respect to task placement for NUMA
> balancing. The NUMA balancer is *far* easier to analyse with the
> patches and informed how the series should be developed.
>
> Patches 4-5 are Vincent's and use very similar code patterns and logic
> between NUMA and load balancer. Patch 6 is a fix to Vincent's work that
> is necessary to avoid serious imbalances being introduced by the NUMA

Yes the test added in load_too_imbalanced() by patch 5 doesn't seem to
be a good choice.
I haven't remove it as it was done by your patch 6 but it might worth
removing it directly if a new version is needed

> balancer. Patches 7-8 are also Vincents and while I have not reviewed
> them closely myself, others have.
>
> The rest of the series are a mix of optimisations and improvements, one
> of which stops the NUMA balancer fighting with itself.
>
> Note that this is not necessarily a universal performance win although
> performance results are generally ok (small gains/losses depending on
> the machine and workload). However, task migrations, page migrations,
> variability and overall overhead are generally reduced.
>
> The main reference workload I used was specjbb running one JVM per node
> which typically would be expected to split evenly. It's an interesting
> workload because the number of "warehouses" does not linearly related
> to the number of running tasks due to the creation of GC threads
> and other interfering activity. The mmtests configuration used is
> jvm-specjbb2005-multi with two runs -- one with ftrace enabling relevant
> scheduler tracepoints.
>
> An example of the headline performance of the series is below and the
> tested kernels are
>
> baseline-v3r1   Patches 1-3 for the tracing
> loadavg-v3      Patches 1-5 (Add half of Vincent's work)
> lbidle-v3       Patches 1-6 Vincent's work with a fix on top
> classify-v3     Patches 1-8 Rest of Vincent's work
> stopsearch-v3   All patches
>