[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zw_k_WFeYFli87ck@swahl-home.5wahls.com>
Date: Wed, 16 Oct 2024 11:08:29 -0500
From: Steve Wahl <steve.wahl@....com>
To: Valentin Schneider <vschneid@...hat.com>
Cc: Steve Wahl <steve.wahl@....com>, Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
Mel Gorman <mgorman@...e.de>, linux-kernel@...r.kernel.org,
Russ Anderson <rja@....com>, Dimitri Sivanich <sivanich@....com>
Subject: Re: [PATCH] sched/topology: improve topology_span_sane speed
On Wed, Oct 16, 2024 at 10:10:11AM +0200, Valentin Schneider wrote:
> On 15/10/24 16:37, Valentin Schneider wrote:
> > On 10/10/24 10:51, Steve Wahl wrote:
> >> Use a different approach to topology_span_sane(), that checks for the
> >> same constraint of no partial overlaps for any two CPU sets for
> >> non-NUMA topology levels, but does so in a way that is O(N) rather
> >> than O(N^2).
> >>
> >> Instead of comparing with all other masks to detect collisions, keep
> >> one mask that includes all CPUs seen so far and detect collisions with
> >> a single cpumask_intersects test.
> >>
> >> If the current mask has no collisions with previously seen masks, it
> >> should be a new mask, which can be uniquely identified by the lowest
> >> bit set in this mask. Keep a pointer to this mask for future
> >> reference (in an array indexed by the lowest bit set), and add the
> >> CPUs in this mask to the list of those seen.
> >>
> >> If the current mask does collide with previously seen masks, it should
> >> be exactly equal to a mask seen before, looked up in the same array
> >> indexed by the lowest bit set in the mask, a single comparison.
> >>
> >> Move the topology_span_sane() check out of the existing topology level
> >> loop, let it use its own loop so that the array allocation can be done
> >> only once, shared across levels.
> >>
> >> On a system with 1920 processors (16 sockets, 60 cores, 2 threads),
> >> the average time to take one processor offline is reduced from 2.18
> >> seconds to 1.01 seconds. (Off-lining 959 of 1920 processors took
> >> 34m49.765s without this change, 16m10.038s with this change in place.)
> >>
> >
> > This isn't the first complaint about topology_span_sane() vs big
> > systems. It might be worth to disable the check once it has scanned all
> > CPUs once - not necessarily at init, since some folks have their systems
> > boot with only a subset of the available CPUs and online them later on.
> >
> > I'd have to think more about how this behaves vs the dynamic NUMA topology
> > code we got as of
> >
> > 0fb3978b0aac ("sched/numa: Fix NUMA topology for systems with CPU-less nodes")
> >
> > (i.e. is scanning all possible CPUs enough to guarantee no overlaps when
> > having only a subset of online CPUs? I think so...)
> >
> > but maybe something like so?
>
>
> I'd also be tempted to shove this under SCHED_DEBUG + sched_verbose, like
> the sched_domain debug fluff. Most distros ship with SCHED_DEBUG anyway, so
> if there is suspicion of topology mask fail, they can slap the extra
> cmdline argument and have it checked.
I understand the idea, but as I read closer, things under SCHED_DEBUG
and sched_verbose currently don't change any actions, just add
additional information output, and some checks that may or may not
print things but do not otherwise change what gets executed.
However, topology_span_sane failing currently causes
build_sched_domains to abort and return an error. I don't think I
should change the action / skip the test if SCHED_DEBUG isn't on,
because I believe the debug version should always take the same paths
as non-debug.
So there's not much that could be removed when SCHED_DEBUG isn't on or
sched_verbose isn't set.
--> Steve
--
Steve Wahl, Hewlett Packard Enterprise
Powered by blists - more mailing lists