[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200108131858.GZ2827@hirez.programming.kicks-ass.net>
Date: Wed, 8 Jan 2020 14:18:58 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: Mel Gorman <mgorman@...hsingularity.net>
Cc: Vincent Guittot <vincent.guittot@...aro.org>,
Ingo Molnar <mingo@...nel.org>, Phil Auld <pauld@...hat.com>,
Valentin Schneider <valentin.schneider@....com>,
Srikar Dronamraju <srikar@...ux.vnet.ibm.com>,
Quentin Perret <quentin.perret@....com>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Morten Rasmussen <Morten.Rasmussen@....com>,
Hillf Danton <hdanton@...a.com>,
Parth Shah <parth@...ux.ibm.com>,
Rik van Riel <riel@...riel.com>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] sched, fair: Allow a small degree of load imbalance
between SD_NUMA domains v2
On Tue, Jan 07, 2020 at 08:24:06PM +0000, Mel Gorman wrote:
> Now I get you, but unfortunately it also would not work out. The number
> of groups is not related to the LLC except in some specific cases.
> It's possible to use the first CPU to find the size of an LLC but now I
> worry that it would lead to unpredictable behaviour. AMD has different
> numbers of LLCs per node depending on the CPU family and while Intel
> generally has one LLC per node, I imagine there are counter examples.
Intel has the 'fun' case of an LLC spanning nodes :-), although Linux
pretends this isn't so and truncates the LLC topology information to be
the node again -- see arch/x86/kernel/smpboot.c:match_llc().
And of course, in the Core2 era we had the Core2Quad chips which was a
dual-die solution and therefore also had multiple LLCs, and I think the
Xeon variant of that would allow the multiple LLC per node situation
too, although this is of course ancient hardware nobody really cares
about anymore.
> This means that load balancing on different machines with similar core
> counts will behave differently due to the LLC size.
That sounds like perfectly fine/expected behaviour to me.
> It might be possible
> to infer it if the intermediate domain was DIE instead of MC but I doubt
> that's guaranteed and it would still be unpredictable. It may be the type
> of complexity that should only be introduced with a separate patch with
> clear rationale as to why it's necessary and we are not at that threshold
> so I withdraw the suggestion.
So IIRC the initial patch(es) had the idea to allow for 1 extra task
imbalance to get 1-1 pairs on the same node, instead of across nodes. I
don't immediately see that in these later patches.
Would that be something to go back to? Would that not side-step much of
the issues under discussion here?
Powered by blists - more mailing lists