[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZrzDAlMiEK4fnLmn@yury-ThinkPad>
Date: Wed, 14 Aug 2024 07:45:36 -0700
From: Yury Norov <yury.norov@...il.com>
To: Tariq Toukan <ttoukan.linux@...il.com>
Cc: Erwan Velu <erwanaliasr1@...il.com>, Erwan Velu <e.velu@...teo.com>,
Saeed Mahameed <saeedm@...dia.com>,
Leon Romanovsky <leon@...nel.org>, Tariq Toukan <tariqt@...dia.com>,
Yury Norov <ynorov@...dia.com>, Rahul Anand <raanand@...dia.com>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
netdev@...r.kernel.org, linux-rdma@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] net/mlx5: Use cpumask_local_spread() instead of custom
code
On Wed, Aug 14, 2024 at 10:48:40AM +0300, Tariq Toukan wrote:
>
>
> On 12/08/2024 11:22, Erwan Velu wrote:
> > Commit 2acda57736de ("net/mlx5e: Improve remote NUMA preferences used for the IRQ affinity hints")
> > removed the usage of cpumask_local_spread().
> >
> > The issue explained in this commit was fixed by
> > commit 406d394abfcd ("cpumask: improve on cpumask_local_spread() locality").
> >
> > Since this commit, mlx5_cpumask_default_spread() is having the same
> > behavior as cpumask_local_spread().
> >
>
> Adding Yuri.
>
> One patch led to the other, finally they were all submitted within the same
> patchset.
>
> cpumask_local_spread() indeed improved, and AFAIU is functionally equivalent
> to existing logic.
> According to [1] the current code is faster.
> However, this alone is not a relevant enough argument, as we're talking
> about slowpath here.
>
> Yuri, is that accurate? Is this the only difference?
>
> If so, I am fine with this change, preferring simplicity.
>
> [1] https://elixir.bootlin.com/linux/v6.11-rc3/source/lib/cpumask.c#L122
If you end up calling mlx5_cpumask_default_spread() for each CPU, it
would be O(N^2). If you call cpumask_local_spread() for each CPU, your
complexity would be O(N*logN), because under the hood it uses binary
search.
The comment you've mentioned says that you can traverse your CPUs in
O(N) if you can manage to put all the logic inside the
for_each_numa_hop_mask() iterator. It doesn't seem to be your case.
I agree with you. mlx5_cpumask_default_spread() should be switched to
using library code.
Acked-by: Yury Norov <yury.norov@...il.com>
You may be interested in siblings-aware CPU distribution I've made
for mana ethernet driver in 91bfe210e196. This is also an example
where using for_each_numa_hop_mask() over simple cpumask_local_spread()
is justified.
Thanks,
Yury
Powered by blists - more mailing lists