[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <xhsmh35e5d9b4.mognet@vschneid.remote.csb>
Date: Tue, 09 Aug 2022 11:02:07 +0100
From: Valentin Schneider <vschneid@...hat.com>
To: Tariq Toukan <ttoukan.linux@...il.com>,
Tariq Toukan <tariqt@...dia.com>,
"David S. Miller" <davem@...emloft.net>,
Saeed Mahameed <saeedm@...dia.com>,
Jakub Kicinski <kuba@...nel.org>,
Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>
Cc: Eric Dumazet <edumazet@...gle.com>,
Paolo Abeni <pabeni@...hat.com>, netdev@...r.kernel.org,
Gal Pressman <gal@...dia.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH net-next V4 1/3] sched/topology: Add NUMA-based CPUs
spread API
On 08/08/22 17:39, Tariq Toukan wrote:
> On 8/4/2022 8:28 PM, Valentin Schneider wrote:
>> On 28/07/22 22:12, Tariq Toukan wrote:
>>> +static bool sched_cpus_spread_by_distance(int node, u16 *cpus, int ncpus)
>> ^^^^^^^^^
>> That should be a struct *cpumask.
>
> With cpumask, we'll lose the order.
>
Right, I didn't get that from the changelog.
>>
>>> +{
>>> + cpumask_var_t cpumask;
>>> + int first, i;
>>> +
>>> + if (!zalloc_cpumask_var(&cpumask, GFP_KERNEL))
>>> + return false;
>>> +
>>> + cpumask_copy(cpumask, cpu_online_mask);
>>> +
>>> + first = cpumask_first(cpumask_of_node(node));
>>> +
>>> + for (i = 0; i < ncpus; i++) {
>>> + int cpu;
>>> +
>>> + cpu = sched_numa_find_closest(cpumask, first);
>>> + if (cpu >= nr_cpu_ids) {
>>> + free_cpumask_var(cpumask);
>>> + return false;
>>> + }
>>> + cpus[i] = cpu;
>>> + __cpumask_clear_cpu(cpu, cpumask);
>>> + }
>>> +
>>> + free_cpumask_var(cpumask);
>>> + return true;
>>> +}
>>
>> This will fail if ncpus > nr_cpu_ids, which shouldn't be a problem. It
>> would make more sense to set *up to* ncpus, the calling code can then
>> decide if getting fewer than requested is OK or not.
>>
>> I also don't get the fallback to cpumask_local_spread(), is that if the
>> NUMA topology hasn't been initialized yet? It feels like most users of this
>> would invoke it late enough (i.e. anything after early initcalls) to have
>> the backing data available.
>
> I don't expect this to fail, as we invoke it late enough. Fallback is
> there just in case, to preserve the old behavior instead of getting
> totally broken.
>
Then there shouldn't be a fallback method - the main method is expected to
work.
>>
>> Finally, I think iterating only once per NUMA level would make more sense.
>
> Agree, although it's just a setup stage.
> I'll check if it can work for me, based on the reference code below.
>
>>
>> I've scribbled something together from those thoughts, see below. This has
>> just the mlx5 bits touched to show what I mean, but that's just compile
>> tested.
>
> My function returns a *sorted* list of the N closest cpus.
> That is important. In many cases, drivers do not need all N irqs, but
> only a portion of it, so it wants to use the closest subset of cpus.
>
Are there cases where we can't figure this out in advance? From what I grok
out of the two callsites you patched, all vectors will be used unless some
error happens, so compressing the CPUs in a single cpumask seemed
sufficient.
Powered by blists - more mailing lists