linux-kernel - Re: [PATCH v3] sched/topology: improve topology_span

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <xhsmhseogiox0.mognet@vschneid-thinkpadt14sgen2i.remote.csb>
Date: Fri, 14 Feb 2025 15:25:31 +0100
From: Valentin Schneider <vschneid@...hat.com>
To: Steve Wahl <steve.wahl@....com>, Steve Wahl <steve.wahl@....com>, Ingo
 Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>, Juri
 Lelli <juri.lelli@...hat.com>, Vincent Guittot
 <vincent.guittot@...aro.org>, Dietmar Eggemann <dietmar.eggemann@....com>,
 Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, Mel
 Gorman <mgorman@...e.de>, linux-kernel@...r.kernel.org, K Prateek Nayak
 <kprateek.nayak@....com>, Vishal Chourasia <vishalc@...ux.ibm.com>, samir
 <samir@...ux.ibm.com>
Cc: Naman Jain <namjain@...ux.microsoft.com>, Saurabh Singh Sengar
 <ssengar@...ux.microsoft.com>, srivatsa@...il.mit.edu, Michael Kelley
 <mhklinux@...look.com>, Russ Anderson <rja@....com>, Dimitri Sivanich
 <sivanich@....com>
Subject: Re: [PATCH v3] sched/topology: improve topology_span_sane speed

On 10/02/25 09:42, Steve Wahl wrote:
> Use a different approach to topology_span_sane(), that checks for the
> same constraint of no partial overlaps for any two CPU sets for
> non-NUMA topology levels, but does so in a way that is O(N) rather
> than O(N^2).
>
> Instead of comparing with all other masks to detect collisions, keep
> one mask that includes all CPUs seen so far and detect collisions with
> a single cpumask_intersects test.
>
> If the current mask has no collisions with previously seen masks, it
> should be a new mask, which can be uniquely identified by the lowest
> bit set in this mask.  Keep a pointer to this mask for future
> reference (in an array indexed by the lowest bit set), and add the
> CPUs in this mask to the list of those seen.
>
> If the current mask does collide with previously seen masks, it should
> be exactly equal to a mask seen before, looked up in the same array
> indexed by the lowest bit set in the mask, a single comparison.
>
> Move the topology_span_sane() check out of the existing topology level
> loop, let it use its own loop so that the array allocation can be done
> only once, shared across levels.
>
> On a system with 1920 processors (16 sockets, 60 cores, 2 threads),
> the average time to take one processor offline is reduced from 2.18
> seconds to 1.01 seconds.  (Off-lining 959 of 1920 processors took
> 34m49.765s without this change, 16m10.038s with this change in place.)
>
> Signed-off-by: Steve Wahl <steve.wahl@....com>
> ---
>
> Version 3: While the intent of this patch is no functional change, I
> discovered that version 2 had conditions where it would give different
> results than the original code.  Version 3 returns to the V1 approach,
> additionally correcting the handling of masks with no bits set and
> fixing the num_possible_cpus() problem Peter Zijlstra noted.  In a
> stand-alone test program that used all possible sets of four 4-bit
> masks, this algorithm matched the original code in all cases, where
> the others did not.
>

So looking at my notes from v2 I was under the impression the array-less
approach worked, do you have an example topology where the array-less
approach fails? I usually poke at topology stuff via QEMU so if you have a
virtual topology description I'd be happy to give that a span.

> -	for_each_cpu_from(i, cpu_map) {
>               /*
> -		 * We should 'and' all those masks with 'cpu_map' to exactly
> -		 * match the topology we're about to build, but that can only
> -		 * remove CPUs, which only lessens our ability to detect
> -		 * overlaps
> +		 * Non-NUMA levels cannot partially overlap - they must be either
> +		 * completely equal or completely disjoint. Otherwise we can end up
> +		 * breaking the sched_group lists - i.e. a later get_group() pass
> +		 * breaks the linking done for an earlier span.
>                */
> -		if (!cpumask_equal(tl->mask(cpu), tl->mask(i)) &&
> -		    cpumask_intersects(tl->mask(cpu), tl->mask(i)))
> -			return false;
> +		for_each_cpu(cpu, cpu_map) {
> +			/* lowest bit set in this mask is used as a unique id */
> +			id = cpumask_first(tl->mask(cpu));
> +
> +			/* zeroed masks cannot possibly collide */
> +			if (id >= nr_cpu_ids)
> +				continue;
> +

Is it even legal for an online CPU's topology mask to be empty?! I would
assume it should *at least* contain itself.