netdev - Re: [PATCH v3 3/4] net: mana: Allow irq_setup() to skip cpus for affinity

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250515045119.GB5681@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net>
Date: Wed, 14 May 2025 21:51:19 -0700
From: Shradha Gupta <shradhagupta@...ux.microsoft.com>
To: Yury Norov <yury.norov@...il.com>
Cc: Michael Kelley <mhklinux@...look.com>, Dexuan Cui <decui@...rosoft.com>,
	Wei Liu <wei.liu@...nel.org>,
	Haiyang Zhang <haiyangz@...rosoft.com>,
	"K. Y. Srinivasan" <kys@...rosoft.com>,
	Andrew Lunn <andrew+netdev@...n.ch>,
	"David S. Miller" <davem@...emloft.net>,
	Eric Dumazet <edumazet@...gle.com>,
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
	Konstantin Taranov <kotaranov@...rosoft.com>,
	Simon Horman <horms@...nel.org>, Leon Romanovsky <leon@...nel.org>,
	Maxim Levitsky <mlevitsk@...hat.com>,
	Erni Sri Satya Vennela <ernis@...ux.microsoft.com>,
	Peter Zijlstra <peterz@...radead.org>,
	"linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
	"linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Nipun Gupta <nipun.gupta@....com>, Jason Gunthorpe <jgg@...pe.ca>,
	Jonathan Cameron <Jonathan.Cameron@...ei.com>,
	Anna-Maria Behnsen <anna-maria@...utronix.de>,
	Kevin Tian <kevin.tian@...el.com>, Long Li <longli@...rosoft.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Bjorn Helgaas <bhelgaas@...gle.com>, Rob Herring <robh@...nel.org>,
	Manivannan Sadhasivam <manivannan.sadhasivam@...aro.org>,
	Krzysztof Wilczy???~Dski <kw@...ux.com>,
	Lorenzo Pieralisi <lpieralisi@...nel.org>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	"linux-rdma@...r.kernel.org" <linux-rdma@...r.kernel.org>,
	Paul Rosswurm <paulros@...rosoft.com>,
	Shradha Gupta <shradhagupta@...rosoft.com>
Subject: Re: [PATCH v3 3/4] net: mana: Allow irq_setup() to skip cpus for
 affinity

On Wed, May 14, 2025 at 01:58:40PM -0400, Yury Norov wrote:
> On Wed, May 14, 2025 at 05:26:45PM +0000, Michael Kelley wrote:
> > > Hope that helps.
> > 
> > Yes, that helps! So the key to understanding "weight" is that
> > NUMA locality is preferred over sibling dislocality.
> > 
> > This is a great summary!  All or most of it should go as a
> > comment describing the function and what it is trying to do.
> 
> OK, please consider applying:
> 
> >From abdf5cc6dabd7f433b1d1e66db86333a33a2cd15 Mon Sep 17 00:00:00 2001
> From: Yury Norov [NVIDIA] <yury.norov@...il.com>
> Date: Wed, 14 May 2025 13:45:26 -0400
> Subject: [PATCH] net: mana: explain irq_setup() algorithm
> 
> Commit 91bfe210e196 ("net: mana: add a function to spread IRQs per CPUs")
> added the irq_setup() function that distributes IRQs on CPUs according
> to a tricky heuristic. The corresponding commit message explains the
> heuristic.
> 
> Duplicate it in the source code to make available for readers without
> digging git in history. Also, add more detailed explanation about how
> the heuristics is implemented.
> 
> Signed-off-by: Yury Norov [NVIDIA] <yury.norov@...il.com>
> ---
>  .../net/ethernet/microsoft/mana/gdma_main.c   | 41 +++++++++++++++++++
>  1 file changed, 41 insertions(+)
> 
> diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> index 4ffaf7588885..f9e8d4d1ba3a 100644
> --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> @@ -1288,6 +1288,47 @@ void mana_gd_free_res_map(struct gdma_resource *r)
>  	r->size = 0;
>  }
>  
> +/*
> + * Spread on CPUs with the following heuristics:
> + *
> + * 1. No more than one IRQ per CPU, if possible;
> + * 2. NUMA locality is the second priority;
> + * 3. Sibling dislocality is the last priority.
> + *
> + * Let's consider this topology:
> + *
> + * Node            0               1
> + * Core        0       1       2       3
> + * CPU       0   1   2   3   4   5   6   7
> + *
> + * The most performant IRQ distribution based on the above topology
> + * and heuristics may look like this:
> + *
> + * IRQ     Nodes   Cores   CPUs
> + * 0       1       0       0-1
> + * 1       1       1       2-3
> + * 2       1       0       0-1
> + * 3       1       1       2-3
> + * 4       2       2       4-5
> + * 5       2       3       6-7
> + * 6       2       2       4-5
> + * 7       2       3       6-7
> + *
> + * The heuristics is implemented as follows.
> + *
> + * The outer for_each() loop resets the 'weight' to the actual number
> + * of CPUs in the hop. Then inner for_each() loop decrements it by the
> + * number of sibling groups (cores) while assigning first set of IRQs
> + * to each group. IRQs 0 and 1 above are distributed this way.
> + *
> + * Now, because NUMA locality is more important, we should walk the
> + * same set of siblings and assign 2nd set of IRQs (2 and 3), and it's
> + * implemented by the medium while() loop. We do like this unless the
> + * number of IRQs assigned on this hop will not become equal to number
> + * of CPUs in the hop (weight == 0). Then we switch to the next hop and
> + * do the same thing.
> + */
> +
>  static int irq_setup(unsigned int *irqs, unsigned int len, int node)
>  {
>  	const struct cpumask *next, *prev = cpu_none_mask;

Thank you Yury,

I will include this patch in the patchset with the next version.

> -- 
> 2.43.0