linux-kernel - Re: [PATCH 00/13] Reconcile NUMA balancing decisions with the load balancer v3

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CAKfTPtDEFvYv3oOAzDHZE5BLE0AByvvHB+67yL=SfAQgEotbGw@mail.gmail.com>
Date:   Mon, 17 Feb 2020 17:15:32 +0100
From:   Vincent Guittot <vincent.guittot@...aro.org>
To:     Mel Gorman <mgorman@...hsingularity.net>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>,
        Valentin Schneider <valentin.schneider@....com>,
        Phil Auld <pauld@...hat.com>, Hillf Danton <hdanton@...a.com>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 00/13] Reconcile NUMA balancing decisions with the load
 balancer v3

On Mon, 17 Feb 2020 at 16:14, Mel Gorman <mgorman@...hsingularity.net> wrote:
>
> On Mon, Feb 17, 2020 at 02:49:11PM +0100, Vincent Guittot wrote:
> > On Mon, 17 Feb 2020 at 11:44, Mel Gorman <mgorman@...hsingularity.net> wrote:
> > >
> > > Changelog since V2:
> > > o Rebase on top of Vincent's series again
> > > o Fix a missed rcu_read_unlock
> > > o Reduce overhead of tracepoint
> > >
> > > Changelog since V1:
> > > o Rebase on top of Vincent's series and rework
> > >
> > > Note: The baseline for this series is tip/sched/core as of February
> > >         12th rebased on top of v5.6-rc1. The series includes patches from
> > >         Vincent as I needed to add a fix and build on top of it. Vincent's
> > >         series on its own introduces performance regressions for *some*
> > >         but not *all* machines so it's easily missed. This series overall
> > >         is close to performance-neutral with some gains depending on the
> > >         machine. However, the end result does less work on NUMA balancing
> > >         and the fact that both the NUMA balancer and load balancer uses
> > >         similar logic makes it much easier to understand.
> > >
> > > The NUMA balancer makes placement decisions on tasks that partially
> > > take the load balancer into account and vice versa but there are
> > > inconsistencies. This can result in placement decisions that override
> > > each other leading to unnecessary migrations -- both task placement
> > > and page placement. This series reconciles many of the decisions --
> > > partially Vincent's work with some fixes and optimisations on top to
> > > merge our two series.
> > >
> > > The first patch is unrelated. It's picked up by tip but was not present in
> > > the tree at the time of the fork. I'm including it here because I tested
> > > with it.
> > >
> > > The second and third patches are tracing only and was needed to get
> > > sensible data out of ftrace with respect to task placement for NUMA
> > > balancing. The NUMA balancer is *far* easier to analyse with the
> > > patches and informed how the series should be developed.
> > >
> > > Patches 4-5 are Vincent's and use very similar code patterns and logic
> > > between NUMA and load balancer. Patch 6 is a fix to Vincent's work that
> > > is necessary to avoid serious imbalances being introduced by the NUMA
> >
> > Yes the test added in load_too_imbalanced() by patch 5 doesn't seem to
> > be a good choice.
>
> But it *did* make sense intuitively!

Yes. In fact, one difference compared to your fix is that
load_too_imbalance() is also called by task_numa_compare() whereas
node_type only is only tested in task_numa_find_cpu() in your patch

>
> > I haven't remove it as it was done by your patch 6 but it might worth
> > removing it directly if a new version is needed
> >
>
> They could be folded together or part folded together but I did not see
> much value in that. I felt that keeping them seperate both preserved the
> development history and acted as a historical reference on why using a
> spare CPU can be hazardous. I do not believe it is a bisection hazard
> as performance is roughly equivalent before and after the series (so
> far at least). LKP might trip up on it and if so, we'll simply ask for
> confirmation that patch 6 fixes it.

that's fine for me

>
> --
> Mel Gorman
> SUSE Labs