[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <12fd25f9-96fb-d0e0-14ec-3f08c01a5a4b@gmail.com>
Date: Tue, 9 Aug 2022 13:18:33 +0300
From: Tariq Toukan <ttoukan.linux@...il.com>
To: Valentin Schneider <vschneid@...hat.com>,
Tariq Toukan <tariqt@...dia.com>,
"David S. Miller" <davem@...emloft.net>,
Saeed Mahameed <saeedm@...dia.com>,
Jakub Kicinski <kuba@...nel.org>,
Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>
Cc: Eric Dumazet <edumazet@...gle.com>,
Paolo Abeni <pabeni@...hat.com>, netdev@...r.kernel.org,
Gal Pressman <gal@...dia.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH net-next V4 1/3] sched/topology: Add NUMA-based CPUs
spread API
On 8/9/2022 1:02 PM, Valentin Schneider wrote:
> On 08/08/22 17:39, Tariq Toukan wrote:
>> On 8/4/2022 8:28 PM, Valentin Schneider wrote:
>>> On 28/07/22 22:12, Tariq Toukan wrote:
>>>> +static bool sched_cpus_spread_by_distance(int node, u16 *cpus, int ncpus)
>>> ^^^^^^^^^
>>> That should be a struct *cpumask.
>>
>> With cpumask, we'll lose the order.
>>
>
> Right, I didn't get that from the changelog.
I'll make it clear when re-spinned.
>
>>>
>>>> +{
>>>> + cpumask_var_t cpumask;
>>>> + int first, i;
>>>> +
>>>> + if (!zalloc_cpumask_var(&cpumask, GFP_KERNEL))
>>>> + return false;
>>>> +
>>>> + cpumask_copy(cpumask, cpu_online_mask);
>>>> +
>>>> + first = cpumask_first(cpumask_of_node(node));
>>>> +
>>>> + for (i = 0; i < ncpus; i++) {
>>>> + int cpu;
>>>> +
>>>> + cpu = sched_numa_find_closest(cpumask, first);
>>>> + if (cpu >= nr_cpu_ids) {
>>>> + free_cpumask_var(cpumask);
>>>> + return false;
>>>> + }
>>>> + cpus[i] = cpu;
>>>> + __cpumask_clear_cpu(cpu, cpumask);
>>>> + }
>>>> +
>>>> + free_cpumask_var(cpumask);
>>>> + return true;
>>>> +}
>>>
>>> This will fail if ncpus > nr_cpu_ids, which shouldn't be a problem. It
>>> would make more sense to set *up to* ncpus, the calling code can then
>>> decide if getting fewer than requested is OK or not.
>>>
>>> I also don't get the fallback to cpumask_local_spread(), is that if the
>>> NUMA topology hasn't been initialized yet? It feels like most users of this
>>> would invoke it late enough (i.e. anything after early initcalls) to have
>>> the backing data available.
>>
>> I don't expect this to fail, as we invoke it late enough. Fallback is
>> there just in case, to preserve the old behavior instead of getting
>> totally broken.
>>
>
> Then there shouldn't be a fallback method - the main method is expected to
> work.
>
I'll drop the fallback then.
>>>
>>> Finally, I think iterating only once per NUMA level would make more sense.
>>
>> Agree, although it's just a setup stage.
>> I'll check if it can work for me, based on the reference code below.
>>
>>>
>>> I've scribbled something together from those thoughts, see below. This has
>>> just the mlx5 bits touched to show what I mean, but that's just compile
>>> tested.
>>
>> My function returns a *sorted* list of the N closest cpus.
>> That is important. In many cases, drivers do not need all N irqs, but
>> only a portion of it, so it wants to use the closest subset of cpus.
>>
>
> Are there cases where we can't figure this out in advance? From what I grok
> out of the two callsites you patched, all vectors will be used unless some
> error happens, so compressing the CPUs in a single cpumask seemed
> sufficient.
>
All vectors will be initialized to support the maximum number of traffic
rings. However, the actual number of traffic rings can be controlled and
set to a lower number N_actual < N. In this case, we'll be using only
N_actual instances and we want them to be the first/closest.
Powered by blists - more mailing lists