netdev - Re: [PATCH net-next V4 1/3] sched/topology: Add NUMA-based CPUs spread API

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <xhsmh35e5d9b4.mognet@vschneid.remote.csb>
Date:   Tue, 09 Aug 2022 11:02:07 +0100
From:   Valentin Schneider <vschneid@...hat.com>
To:     Tariq Toukan <ttoukan.linux@...il.com>,
        Tariq Toukan <tariqt@...dia.com>,
        "David S. Miller" <davem@...emloft.net>,
        Saeed Mahameed <saeedm@...dia.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>
Cc:     Eric Dumazet <edumazet@...gle.com>,
        Paolo Abeni <pabeni@...hat.com>, netdev@...r.kernel.org,
        Gal Pressman <gal@...dia.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH net-next V4 1/3] sched/topology: Add NUMA-based CPUs
 spread API

On 08/08/22 17:39, Tariq Toukan wrote:
> On 8/4/2022 8:28 PM, Valentin Schneider wrote:
>> On 28/07/22 22:12, Tariq Toukan wrote:
>>> +static bool sched_cpus_spread_by_distance(int node, u16 *cpus, int ncpus)
>>                                                         ^^^^^^^^^
>> That should be a struct *cpumask.
>
> With cpumask, we'll lose the order.
>

Right, I didn't get that from the changelog.

>>
>>> +{
>>> +	cpumask_var_t cpumask;
>>> +	int first, i;
>>> +
>>> +	if (!zalloc_cpumask_var(&cpumask, GFP_KERNEL))
>>> +		return false;
>>> +
>>> +	cpumask_copy(cpumask, cpu_online_mask);
>>> +
>>> +	first = cpumask_first(cpumask_of_node(node));
>>> +
>>> +	for (i = 0; i < ncpus; i++) {
>>> +		int cpu;
>>> +
>>> +		cpu = sched_numa_find_closest(cpumask, first);
>>> +		if (cpu >= nr_cpu_ids) {
>>> +			free_cpumask_var(cpumask);
>>> +			return false;
>>> +		}
>>> +		cpus[i] = cpu;
>>> +		__cpumask_clear_cpu(cpu, cpumask);
>>> +	}
>>> +
>>> +	free_cpumask_var(cpumask);
>>> +	return true;
>>> +}
>>
>> This will fail if ncpus > nr_cpu_ids, which shouldn't be a problem. It
>> would make more sense to set *up to* ncpus, the calling code can then
>> decide if getting fewer than requested is OK or not.
>>
>> I also don't get the fallback to cpumask_local_spread(), is that if the
>> NUMA topology hasn't been initialized yet? It feels like most users of this
>> would invoke it late enough (i.e. anything after early initcalls) to have
>> the backing data available.
>
> I don't expect this to fail, as we invoke it late enough. Fallback is
> there just in case, to preserve the old behavior instead of getting
> totally broken.
>

Then there shouldn't be a fallback method - the main method is expected to
work.

>>
>> Finally, I think iterating only once per NUMA level would make more sense.
>
> Agree, although it's just a setup stage.
> I'll check if it can work for me, based on the reference code below.
>
>>
>> I've scribbled something together from those thoughts, see below. This has
>> just the mlx5 bits touched to show what I mean, but that's just compile
>> tested.
>
> My function returns a *sorted* list of the N closest cpus.
> That is important. In many cases, drivers do not need all N irqs, but
> only a portion of it, so it wants to use the closest subset of cpus.
>

Are there cases where we can't figure this out in advance? From what I grok
out of the two callsites you patched, all vectors will be used unless some
error happens, so compressing the CPUs in a single cpumask seemed
sufficient.