linux-kernel - Re: [PATCH] sched/topology: improve topology_span

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <xhsmh34kxv3dc.mognet@vschneid-thinkpadt14sgen2i.remote.csb>
Date: Tue, 15 Oct 2024 16:37:35 +0200
From: Valentin Schneider <vschneid@...hat.com>
To: Steve Wahl <steve.wahl@....com>, Steve Wahl <steve.wahl@....com>, Ingo
 Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>, Juri
 Lelli <juri.lelli@...hat.com>, Vincent Guittot
 <vincent.guittot@...aro.org>, Dietmar Eggemann <dietmar.eggemann@....com>,
 Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, Mel
 Gorman <mgorman@...e.de>, linux-kernel@...r.kernel.org
Cc: Russ Anderson <rja@....com>, Dimitri Sivanich <sivanich@....com>
Subject: Re: [PATCH] sched/topology: improve topology_span_sane speed

On 10/10/24 10:51, Steve Wahl wrote:
> Use a different approach to topology_span_sane(), that checks for the
> same constraint of no partial overlaps for any two CPU sets for
> non-NUMA topology levels, but does so in a way that is O(N) rather
> than O(N^2).
>
> Instead of comparing with all other masks to detect collisions, keep
> one mask that includes all CPUs seen so far and detect collisions with
> a single cpumask_intersects test.
>
> If the current mask has no collisions with previously seen masks, it
> should be a new mask, which can be uniquely identified by the lowest
> bit set in this mask.  Keep a pointer to this mask for future
> reference (in an array indexed by the lowest bit set), and add the
> CPUs in this mask to the list of those seen.
>
> If the current mask does collide with previously seen masks, it should
> be exactly equal to a mask seen before, looked up in the same array
> indexed by the lowest bit set in the mask, a single comparison.
>
> Move the topology_span_sane() check out of the existing topology level
> loop, let it use its own loop so that the array allocation can be done
> only once, shared across levels.
>
> On a system with 1920 processors (16 sockets, 60 cores, 2 threads),
> the average time to take one processor offline is reduced from 2.18
> seconds to 1.01 seconds.  (Off-lining 959 of 1920 processors took
> 34m49.765s without this change, 16m10.038s with this change in place.)
>

This isn't the first complaint about topology_span_sane() vs big
systems. It might be worth to disable the check once it has scanned all
CPUs once - not necessarily at init, since some folks have their systems
boot with only a subset of the available CPUs and online them later on.

I'd have to think more about how this behaves vs the dynamic NUMA topology
code we got as of

  0fb3978b0aac ("sched/numa: Fix NUMA topology for systems with CPU-less nodes")

(i.e. is scanning all possible CPUs enough to guarantee no overlaps when
having only a subset of online CPUs? I think so...)

but maybe something like so?
---
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 9748a4c8d6685..bf95c3d4f6072 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -2361,12 +2361,25 @@ static struct sched_domain *build_sched_domain(struct sched_domain_topology_leve
 static bool topology_span_sane(struct sched_domain_topology_level *tl,
 			      const struct cpumask *cpu_map, int cpu)
 {
+	static bool validated;
 	int i = cpu + 1;
 
+	if (validated)
+		return true;
+
 	/* NUMA levels are allowed to overlap */
 	if (tl->flags & SDTL_OVERLAP)
 		return true;
 
+	/*
+	 * We're visiting all CPUs available in the system, no need to re-check
+	 * things after that. Even if we end up finding overlaps here, we'll
+	 * have issued a warning and can skip the per-CPU scan in later
+	 * calls to this function.
+	 */
+	if (cpumask_equal(cpu_map, cpu_possible_mask))
+		validated = true;
+
 	/*
 	 * Non-NUMA levels cannot partially overlap - they must be either
 	 * completely equal or completely disjoint. Otherwise we can end up