linux-kernel - Re: [PATCH] sched, fair: Allow a small degree of load imbalance between SD

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200108131858.GZ2827@hirez.programming.kicks-ass.net>
Date:   Wed, 8 Jan 2020 14:18:58 +0100
From:   Peter Zijlstra <peterz@...radead.org>
To:     Mel Gorman <mgorman@...hsingularity.net>
Cc:     Vincent Guittot <vincent.guittot@...aro.org>,
        Ingo Molnar <mingo@...nel.org>, Phil Auld <pauld@...hat.com>,
        Valentin Schneider <valentin.schneider@....com>,
        Srikar Dronamraju <srikar@...ux.vnet.ibm.com>,
        Quentin Perret <quentin.perret@....com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Morten Rasmussen <Morten.Rasmussen@....com>,
        Hillf Danton <hdanton@...a.com>,
        Parth Shah <parth@...ux.ibm.com>,
        Rik van Riel <riel@...riel.com>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] sched, fair: Allow a small degree of load imbalance
 between SD_NUMA domains v2

On Tue, Jan 07, 2020 at 08:24:06PM +0000, Mel Gorman wrote:
> Now I get you, but unfortunately it also would not work out. The number
> of groups is not related to the LLC except in some specific cases.
> It's possible to use the first CPU to find the size of an LLC but now I
> worry that it would lead to unpredictable behaviour. AMD has different
> numbers of LLCs per node depending on the CPU family and while Intel
> generally has one LLC per node, I imagine there are counter examples.

Intel has the 'fun' case of an LLC spanning nodes :-), although Linux
pretends this isn't so and truncates the LLC topology information to be
the node again -- see arch/x86/kernel/smpboot.c:match_llc().

And of course, in the Core2 era we had the Core2Quad chips which was a
dual-die solution and therefore also had multiple LLCs, and I think the
Xeon variant of that would allow the multiple LLC per node situation
too, although this is of course ancient hardware nobody really cares
about anymore.

> This means that load balancing on different machines with similar core
> counts will behave differently due to the LLC size.

That sounds like perfectly fine/expected behaviour to me.

> It might be possible
> to infer it if the intermediate domain was DIE instead of MC but I doubt
> that's guaranteed and it would still be unpredictable. It may be the type
> of complexity that should only be introduced with a separate patch with
> clear rationale as to why it's necessary and we are not at that threshold
> so I withdraw the suggestion.

So IIRC the initial patch(es) had the idea to allow for 1 extra task
imbalance to get 1-1 pairs on the same node, instead of across nodes. I
don't immediately see that in these later patches.

Would that be something to go back to? Would that not side-step much of
the issues under discussion here?