[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <jhjimerlf2a.mognet@arm.com>
Date: Mon, 13 Jul 2020 14:28:29 +0100
From: Valentin Schneider <valentin.schneider@....com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: linux-kernel@...r.kernel.org, mingo@...nel.org,
vincent.guittot@...aro.org, dietmar.eggemann@....com,
morten.rasmussen@....com
Subject: Re: [PATCH v3 7/7] sched/topology: Use prebuilt SD flag degeneration mask
On 13/07/20 13:55, Peter Zijlstra wrote:
> On Wed, Jul 01, 2020 at 08:06:55PM +0100, Valentin Schneider wrote:
>> Leverage SD_DEGENERATE_GROUPS_MASK in sd_degenerate() and
>> sd_degenerate_parent().
>>
>> Note that this changes sd_degenerate() somewhat: I'm using the negation of
>> SD_DEGENERATE_GROUPS_MASK as the mask of flags not requiring groups, which
>> is equivalent to:
>>
>> SD_WAKE_AFFINE | SD_SERIALIZE | SD_NUMA
>>
>> whereas the current mask for that is simply
>>
>> SD_WAKE_AFFINE
>>
>> I played with a few toy NUMA topologies on QEMU and couldn't cause a
>> different degeneration than what mainline does currently. If that is deemed
>> too risky, we can go back to using SD_WAKE_AFFINE explicitly.
>
> Arguably SD_SERIALIZE needs groups, note how we're only having that
> effective for machines with at least 2 nodes. It's a bit shit how we end
> up there, but IIRC that's what it ends up as.
>
Right, AFAICT we get SD_SERIALIZE wherever we have SD_NUMA, which is any
level above NODE.
> SD_NUMA is descriptive, and not marking a group as degenerates because
> it has SD_NUMA seems a bit silly.
It does, although we can still degenerate it, see below.
> But then, it would be the top domain
> and would survive anyway?
So from what I've tested we still get rid of those via
sd_parent_degenerate(): child and parent have the same flags and same span,
so parent goes out.
That happens in the middle of the NUMA topology levels on that borked
topology with weird distances, aka
node distances:
node 0 1 2 3
0: 10 12 20 22
1: 12 10 22 24
2: 20 22 10 12
3: 22 24 12 10
which ought to look something like (+local distance to end result)
2 10 2
1 <---> 0 <---> 2 <---> 3
We end up with the following NUMA levels (i.e. deduplicated distances)
NUMA (<= 12)
NUMA (<= 20)
NUMA (<= 22)
NUMA (<= 24)
For e.g. any CPU of node1, NUMA(<=20) is gonna have the same span as
NUMA(<=12), so we'll degenerate it.
Powered by blists - more mailing lists