linux-kernel - Re: [PATCH v3] sched/topology: improve topology_span

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <xhsmhwmddzpc2.mognet@vschneid-thinkpadt14sgen2i.remote.csb>
Date: Tue, 25 Feb 2025 22:28:29 +0100
From: Valentin Schneider <vschneid@...hat.com>
To: Steve Wahl <steve.wahl@....com>
Cc: Steve Wahl <steve.wahl@....com>, Ingo Molnar <mingo@...hat.com>, Peter
 Zijlstra <peterz@...radead.org>, Juri Lelli <juri.lelli@...hat.com>,
 Vincent Guittot <vincent.guittot@...aro.org>, Dietmar Eggemann
 <dietmar.eggemann@....com>, Steven Rostedt <rostedt@...dmis.org>, Ben
 Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
 linux-kernel@...r.kernel.org, K Prateek Nayak <kprateek.nayak@....com>,
 Vishal Chourasia <vishalc@...ux.ibm.com>, samir <samir@...ux.ibm.com>,
 Naman Jain <namjain@...ux.microsoft.com>, Saurabh Singh Sengar
 <ssengar@...ux.microsoft.com>, srivatsa@...il.mit.edu, Michael Kelley
 <mhklinux@...look.com>, Russ Anderson <rja@....com>, Dimitri Sivanich
 <sivanich@....com>
Subject: Re: [PATCH v3] sched/topology: improve topology_span_sane speed

On 20/02/25 13:59, Steve Wahl wrote:
> On Mon, Feb 17, 2025 at 11:11:36AM +0100, Valentin Schneider wrote:
>> On 14/02/25 09:42, Steve Wahl wrote:
>> >
>> > Valentin, thank you for your time looking at this patch.
>> >
>> > Note that I'm trying to make this patch function exactly as the code
>> > did before, only faster, regardless of the inputs.  No functional
>> > change.
>> >
>> > Your statement below about assuming a mask should at least contain the
>> > cpu itself is intertwined with finding differences.  This code is
>> > checking the validity of the masks.  If we can't already trust that
>> > the masks are disjoint, why can we trust they at least have the cpu
>> > itself set?
>> >
>>
>> True... Though I think this would already be caught by the sched_domain
>> debugging infra we have, see sched_domain_debug_one().
>
> Note that a previous patch of mine was reverted because it allowed
> another problem to surface on a small number of machines (and was
> later re-instated after fixing the other problem).
>
> Reference: https://lore.kernel.org/all/20240717213121.3064030-1-steve.wahl@hpe.com
>
> So, I am quite sensitive to introducing unintended behavior changes.
>
> Anyway, sched_domain_debug_one() is only called when
> sched_debug_verbose is set.  But, at least as it sits currently,
> topology_span_sane() is run at all times, and its return code is acted
> on to change system behavior.
>
> If there's a system out there where the cpu masks are buggy but
> currently accepted, I don't want this patch to cause that system to
> degrade by declaring it insane.
>
> I don't fully understand all the code that sets up masks, as there's a
> lot of it.  But as an example, I think I see in
> arch/s390/kernel/topology.c, that update_cpu_masks() uses
> cpu_group_map() to update masks, and that function zeros the mask and
> then returns if the cpu is not set in cpu_setup_mask.  So potentially
> there can be some zeroed masks.
>
> [Why am I looking at s390 code? Simply because a 'sort | uniq' on the
> possible tl->mask() functions took me to cpu_book_mask() first.]
>

IIUC that cpu_setup_mask is pretty much cpu_online_mask:

smp_start_secondary(void *cpuvoid)
  cpumask_set_cpu(cpu, &cpu_setup_mask);
  set_cpu_online(cpu, true);

int __cpu_disable(void)
  set_cpu_online(cpu, false);
  cpumask_clear_cpu(cpu, &cpu_setup_mask);

IOW, topology code will build something that isn't a subset of
cpu_setup_mask, thus won't get an empty mask.