linux-kernel - Re: [PATCH v3] sched/topology: improve topology_span

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z69kbnJM5n64Ac6h@swahl-home.5wahls.com>
Date: Fri, 14 Feb 2025 09:42:38 -0600
From: Steve Wahl <steve.wahl@....com>
To: Valentin Schneider <vschneid@...hat.com>
Cc: Steve Wahl <steve.wahl@....com>, Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
        Mel Gorman <mgorman@...e.de>, linux-kernel@...r.kernel.org,
        K Prateek Nayak <kprateek.nayak@....com>,
        Vishal Chourasia <vishalc@...ux.ibm.com>, samir <samir@...ux.ibm.com>,
        Naman Jain <namjain@...ux.microsoft.com>,
        Saurabh Singh Sengar <ssengar@...ux.microsoft.com>,
        srivatsa@...il.mit.edu, Michael Kelley <mhklinux@...look.com>,
        Russ Anderson <rja@....com>, Dimitri Sivanich <sivanich@....com>
Subject: Re: [PATCH v3] sched/topology: improve topology_span_sane speed

On Fri, Feb 14, 2025 at 03:25:31PM +0100, Valentin Schneider wrote:
> On 10/02/25 09:42, Steve Wahl wrote:
> > Use a different approach to topology_span_sane(), that checks for the
> > same constraint of no partial overlaps for any two CPU sets for
> > non-NUMA topology levels, but does so in a way that is O(N) rather
> > than O(N^2).
> >
> > Instead of comparing with all other masks to detect collisions, keep
> > one mask that includes all CPUs seen so far and detect collisions with
> > a single cpumask_intersects test.
> >
> > If the current mask has no collisions with previously seen masks, it
> > should be a new mask, which can be uniquely identified by the lowest
> > bit set in this mask.  Keep a pointer to this mask for future
> > reference (in an array indexed by the lowest bit set), and add the
> > CPUs in this mask to the list of those seen.
> >
> > If the current mask does collide with previously seen masks, it should
> > be exactly equal to a mask seen before, looked up in the same array
> > indexed by the lowest bit set in the mask, a single comparison.
> >
> > Move the topology_span_sane() check out of the existing topology level
> > loop, let it use its own loop so that the array allocation can be done
> > only once, shared across levels.
> >
> > On a system with 1920 processors (16 sockets, 60 cores, 2 threads),
> > the average time to take one processor offline is reduced from 2.18
> > seconds to 1.01 seconds.  (Off-lining 959 of 1920 processors took
> > 34m49.765s without this change, 16m10.038s with this change in place.)
> >
> > Signed-off-by: Steve Wahl <steve.wahl@....com>
> > ---
> >
> > Version 3: While the intent of this patch is no functional change, I
> > discovered that version 2 had conditions where it would give different
> > results than the original code.  Version 3 returns to the V1 approach,
> > additionally correcting the handling of masks with no bits set and
> > fixing the num_possible_cpus() problem Peter Zijlstra noted.  In a
> > stand-alone test program that used all possible sets of four 4-bit
> > masks, this algorithm matched the original code in all cases, where
> > the others did not.
> >
> 
> So looking at my notes from v2 I was under the impression the array-less
> approach worked, do you have an example topology where the array-less
> approach fails? I usually poke at topology stuff via QEMU so if you have a
> virtual topology description I'd be happy to give that a span.

Valentin, thank you for your time looking at this patch.

Note that I'm trying to make this patch function exactly as the code
did before, only faster, regardless of the inputs.  No functional
change.

Your statement below about assuming a mask should at least contain the
cpu itself is intertwined with finding differences.  This code is
checking the validity of the masks.  If we can't already trust that
the masks are disjoint, why can we trust they at least have the cpu
itself set?

Where the assumption that a cpu's mask contains itself holds true, it
appears v2 and v3 agree.

An example of where the two disagree would be a set of four masks,
0010 0001 0001 0001.  These match the stated criteria being checked
(no overlap between masks that don't exactly match), yet the v2
algorithm would mark it insane, where the original code doesn't.

If it's valid to mark some additional conditions as insane, I'd rather
see that in a different patch, because I'm not comfortable enough with
the ramifications of possibly marking as insane a system that current
code reports as sane.

> > -	for_each_cpu_from(i, cpu_map) {
> >               /*
> > -		 * We should 'and' all those masks with 'cpu_map' to exactly
> > -		 * match the topology we're about to build, but that can only
> > -		 * remove CPUs, which only lessens our ability to detect
> > -		 * overlaps
> > +		 * Non-NUMA levels cannot partially overlap - they must be either
> > +		 * completely equal or completely disjoint. Otherwise we can end up
> > +		 * breaking the sched_group lists - i.e. a later get_group() pass
> > +		 * breaks the linking done for an earlier span.
> >                */
> > -		if (!cpumask_equal(tl->mask(cpu), tl->mask(i)) &&
> > -		    cpumask_intersects(tl->mask(cpu), tl->mask(i)))
> > -			return false;
> > +		for_each_cpu(cpu, cpu_map) {
> > +			/* lowest bit set in this mask is used as a unique id */
> > +			id = cpumask_first(tl->mask(cpu));
> > +
> > +			/* zeroed masks cannot possibly collide */
> > +			if (id >= nr_cpu_ids)
> > +				continue;
> > +
> 
> Is it even legal for an online CPU's topology mask to be empty?! I would
> assume it should *at least* contain itself.

Again, this is sanity checking untrusted inputs.  I realized that if a
mask was zero, the value of id would overindex the array.  The check
is cheap.

--> Steve

-- 
Steve Wahl, Hewlett Packard Enterprise