[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <xhsmhv7xstqn0.mognet@vschneid-thinkpadt14sgen2i.remote.csb>
Date: Wed, 16 Oct 2024 10:10:11 +0200
From: Valentin Schneider <vschneid@...hat.com>
To: Steve Wahl <steve.wahl@....com>, Steve Wahl <steve.wahl@....com>, Ingo
Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>, Juri
Lelli <juri.lelli@...hat.com>, Vincent Guittot
<vincent.guittot@...aro.org>, Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, Mel
Gorman <mgorman@...e.de>, linux-kernel@...r.kernel.org
Cc: Russ Anderson <rja@....com>, Dimitri Sivanich <sivanich@....com>
Subject: Re: [PATCH] sched/topology: improve topology_span_sane speed
On 15/10/24 16:37, Valentin Schneider wrote:
> On 10/10/24 10:51, Steve Wahl wrote:
>> Use a different approach to topology_span_sane(), that checks for the
>> same constraint of no partial overlaps for any two CPU sets for
>> non-NUMA topology levels, but does so in a way that is O(N) rather
>> than O(N^2).
>>
>> Instead of comparing with all other masks to detect collisions, keep
>> one mask that includes all CPUs seen so far and detect collisions with
>> a single cpumask_intersects test.
>>
>> If the current mask has no collisions with previously seen masks, it
>> should be a new mask, which can be uniquely identified by the lowest
>> bit set in this mask. Keep a pointer to this mask for future
>> reference (in an array indexed by the lowest bit set), and add the
>> CPUs in this mask to the list of those seen.
>>
>> If the current mask does collide with previously seen masks, it should
>> be exactly equal to a mask seen before, looked up in the same array
>> indexed by the lowest bit set in the mask, a single comparison.
>>
>> Move the topology_span_sane() check out of the existing topology level
>> loop, let it use its own loop so that the array allocation can be done
>> only once, shared across levels.
>>
>> On a system with 1920 processors (16 sockets, 60 cores, 2 threads),
>> the average time to take one processor offline is reduced from 2.18
>> seconds to 1.01 seconds. (Off-lining 959 of 1920 processors took
>> 34m49.765s without this change, 16m10.038s with this change in place.)
>>
>
> This isn't the first complaint about topology_span_sane() vs big
> systems. It might be worth to disable the check once it has scanned all
> CPUs once - not necessarily at init, since some folks have their systems
> boot with only a subset of the available CPUs and online them later on.
>
> I'd have to think more about how this behaves vs the dynamic NUMA topology
> code we got as of
>
> 0fb3978b0aac ("sched/numa: Fix NUMA topology for systems with CPU-less nodes")
>
> (i.e. is scanning all possible CPUs enough to guarantee no overlaps when
> having only a subset of online CPUs? I think so...)
>
> but maybe something like so?
I'd also be tempted to shove this under SCHED_DEBUG + sched_verbose, like
the sched_domain debug fluff. Most distros ship with SCHED_DEBUG anyway, so
if there is suspicion of topology mask fail, they can slap the extra
cmdline argument and have it checked.
Powered by blists - more mailing lists