[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zxuz2I1Bn0yG4MYj@swahl-home.5wahls.com>
Date: Fri, 25 Oct 2024 10:06:00 -0500
From: Steve Wahl <steve.wahl@....com>
To: Valentin Schneider <vschneid@...hat.com>
Cc: Steve Wahl <steve.wahl@....com>, Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
Mel Gorman <mgorman@...e.de>, linux-kernel@...r.kernel.org,
Russ Anderson <rja@....com>, Dimitri Sivanich <sivanich@....com>
Subject: Re: [PATCH] sched/topology: improve topology_span_sane speed
On Tue, Oct 15, 2024 at 04:37:35PM +0200, Valentin Schneider wrote:
> On 10/10/24 10:51, Steve Wahl wrote:
> > Use a different approach to topology_span_sane(), that checks for the
> > same constraint of no partial overlaps for any two CPU sets for
> > non-NUMA topology levels, but does so in a way that is O(N) rather
> > than O(N^2).
> >
> > Instead of comparing with all other masks to detect collisions, keep
> > one mask that includes all CPUs seen so far and detect collisions with
> > a single cpumask_intersects test.
> >
> > If the current mask has no collisions with previously seen masks, it
> > should be a new mask, which can be uniquely identified by the lowest
> > bit set in this mask. Keep a pointer to this mask for future
> > reference (in an array indexed by the lowest bit set), and add the
> > CPUs in this mask to the list of those seen.
> >
> > If the current mask does collide with previously seen masks, it should
> > be exactly equal to a mask seen before, looked up in the same array
> > indexed by the lowest bit set in the mask, a single comparison.
> >
> > Move the topology_span_sane() check out of the existing topology level
> > loop, let it use its own loop so that the array allocation can be done
> > only once, shared across levels.
> >
> > On a system with 1920 processors (16 sockets, 60 cores, 2 threads),
> > the average time to take one processor offline is reduced from 2.18
> > seconds to 1.01 seconds. (Off-lining 959 of 1920 processors took
> > 34m49.765s without this change, 16m10.038s with this change in place.)
> >
>
> This isn't the first complaint about topology_span_sane() vs big
> systems. It might be worth to disable the check once it has scanned all
> CPUs once - not necessarily at init, since some folks have their systems
> boot with only a subset of the available CPUs and online them later on.
>
> I'd have to think more about how this behaves vs the dynamic NUMA topology
> code we got as of
>
> 0fb3978b0aac ("sched/numa: Fix NUMA topology for systems with CPU-less nodes")
>
> (i.e. is scanning all possible CPUs enough to guarantee no overlaps when
> having only a subset of online CPUs? I think so...)
>
> but maybe something like so?
> ---
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 9748a4c8d6685..bf95c3d4f6072 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -2361,12 +2361,25 @@ static struct sched_domain *build_sched_domain(struct sched_domain_topology_leve
> static bool topology_span_sane(struct sched_domain_topology_level *tl,
> const struct cpumask *cpu_map, int cpu)
> {
> + static bool validated;
> int i = cpu + 1;
>
> + if (validated)
> + return true;
> +
> /* NUMA levels are allowed to overlap */
> if (tl->flags & SDTL_OVERLAP)
> return true;
>
> + /*
> + * We're visiting all CPUs available in the system, no need to re-check
> + * things after that. Even if we end up finding overlaps here, we'll
> + * have issued a warning and can skip the per-CPU scan in later
> + * calls to this function.
> + */
> + if (cpumask_equal(cpu_map, cpu_possible_mask))
> + validated = true;
> +
> /*
> * Non-NUMA levels cannot partially overlap - they must be either
> * completely equal or completely disjoint. Otherwise we can end up
I tried adding this, surprisingly I saw no effect on the time taken,
perhaps even a small slowdown, when combined with my patch. So at
this point I don't intend to add it to v2 of the patch.
--> Steve
--
Steve Wahl, Hewlett Packard Enterprise
Powered by blists - more mailing lists