linux-kernel - Re: [PATCH net-next V4 1/3] sched/topology: Add NUMA-based CPUs spread API

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <12fd25f9-96fb-d0e0-14ec-3f08c01a5a4b@gmail.com>
Date:   Tue, 9 Aug 2022 13:18:33 +0300
From:   Tariq Toukan <ttoukan.linux@...il.com>
To:     Valentin Schneider <vschneid@...hat.com>,
        Tariq Toukan <tariqt@...dia.com>,
        "David S. Miller" <davem@...emloft.net>,
        Saeed Mahameed <saeedm@...dia.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>
Cc:     Eric Dumazet <edumazet@...gle.com>,
        Paolo Abeni <pabeni@...hat.com>, netdev@...r.kernel.org,
        Gal Pressman <gal@...dia.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH net-next V4 1/3] sched/topology: Add NUMA-based CPUs
 spread API



On 8/9/2022 1:02 PM, Valentin Schneider wrote:
> On 08/08/22 17:39, Tariq Toukan wrote:
>> On 8/4/2022 8:28 PM, Valentin Schneider wrote:
>>> On 28/07/22 22:12, Tariq Toukan wrote:
>>>> +static bool sched_cpus_spread_by_distance(int node, u16 *cpus, int ncpus)
>>>                                                          ^^^^^^^^^
>>> That should be a struct *cpumask.
>>
>> With cpumask, we'll lose the order.
>>
> 
> Right, I didn't get that from the changelog.

I'll make it clear when re-spinned.

> 
>>>
>>>> +{
>>>> +	cpumask_var_t cpumask;
>>>> +	int first, i;
>>>> +
>>>> +	if (!zalloc_cpumask_var(&cpumask, GFP_KERNEL))
>>>> +		return false;
>>>> +
>>>> +	cpumask_copy(cpumask, cpu_online_mask);
>>>> +
>>>> +	first = cpumask_first(cpumask_of_node(node));
>>>> +
>>>> +	for (i = 0; i < ncpus; i++) {
>>>> +		int cpu;
>>>> +
>>>> +		cpu = sched_numa_find_closest(cpumask, first);
>>>> +		if (cpu >= nr_cpu_ids) {
>>>> +			free_cpumask_var(cpumask);
>>>> +			return false;
>>>> +		}
>>>> +		cpus[i] = cpu;
>>>> +		__cpumask_clear_cpu(cpu, cpumask);
>>>> +	}
>>>> +
>>>> +	free_cpumask_var(cpumask);
>>>> +	return true;
>>>> +}
>>>
>>> This will fail if ncpus > nr_cpu_ids, which shouldn't be a problem. It
>>> would make more sense to set *up to* ncpus, the calling code can then
>>> decide if getting fewer than requested is OK or not.
>>>
>>> I also don't get the fallback to cpumask_local_spread(), is that if the
>>> NUMA topology hasn't been initialized yet? It feels like most users of this
>>> would invoke it late enough (i.e. anything after early initcalls) to have
>>> the backing data available.
>>
>> I don't expect this to fail, as we invoke it late enough. Fallback is
>> there just in case, to preserve the old behavior instead of getting
>> totally broken.
>>
> 
> Then there shouldn't be a fallback method - the main method is expected to
> work.
> 

I'll drop the fallback then.

>>>
>>> Finally, I think iterating only once per NUMA level would make more sense.
>>
>> Agree, although it's just a setup stage.
>> I'll check if it can work for me, based on the reference code below.
>>
>>>
>>> I've scribbled something together from those thoughts, see below. This has
>>> just the mlx5 bits touched to show what I mean, but that's just compile
>>> tested.
>>
>> My function returns a *sorted* list of the N closest cpus.
>> That is important. In many cases, drivers do not need all N irqs, but
>> only a portion of it, so it wants to use the closest subset of cpus.
>>
> 
> Are there cases where we can't figure this out in advance? From what I grok
> out of the two callsites you patched, all vectors will be used unless some
> error happens, so compressing the CPUs in a single cpumask seemed
> sufficient.
> 

All vectors will be initialized to support the maximum number of traffic 
rings. However, the actual number of traffic rings can be controlled and 
set to a lower number N_actual < N. In this case, we'll be using only 
N_actual instances and we want them to be the first/closest.