netdev - RE: [PATCH v3 3/4] net: mana: Allow irq_setup() to skip cpus for affinity

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
 <SN6PR02MB41577E2FAA79E2803C3384B0D491A@SN6PR02MB4157.namprd02.prod.outlook.com>
Date: Wed, 14 May 2025 04:53:34 +0000
From: Michael Kelley <mhklinux@...look.com>
To: Shradha Gupta <shradhagupta@...ux.microsoft.com>, Dexuan Cui
	<decui@...rosoft.com>, Wei Liu <wei.liu@...nel.org>, Haiyang Zhang
	<haiyangz@...rosoft.com>, "K. Y. Srinivasan" <kys@...rosoft.com>, Andrew Lunn
	<andrew+netdev@...n.ch>, "David S. Miller" <davem@...emloft.net>, Eric
 Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>, Paolo Abeni
	<pabeni@...hat.com>, Konstantin Taranov <kotaranov@...rosoft.com>, Simon
 Horman <horms@...nel.org>, Leon Romanovsky <leon@...nel.org>, Maxim Levitsky
	<mlevitsk@...hat.com>, Erni Sri Satya Vennela <ernis@...ux.microsoft.com>,
	Peter Zijlstra <peterz@...radead.org>
CC: "linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
	"linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, Nipun Gupta
	<nipun.gupta@....com>, Yury Norov <yury.norov@...il.com>, Jason Gunthorpe
	<jgg@...pe.ca>, Jonathan Cameron <Jonathan.Cameron@...ei.com>, Anna-Maria
 Behnsen <anna-maria@...utronix.de>, Kevin Tian <kevin.tian@...el.com>, Long
 Li <longli@...rosoft.com>, Thomas Gleixner <tglx@...utronix.de>, Bjorn
 Helgaas <bhelgaas@...gle.com>, Rob Herring <robh@...nel.org>, Manivannan
 Sadhasivam <manivannan.sadhasivam@...aro.org>,
	Krzysztof Wilczy�~Dski <kw@...ux.com>, Lorenzo
 Pieralisi <lpieralisi@...nel.org>, "netdev@...r.kernel.org"
	<netdev@...r.kernel.org>, "linux-rdma@...r.kernel.org"
	<linux-rdma@...r.kernel.org>, Paul Rosswurm <paulros@...rosoft.com>, Shradha
 Gupta <shradhagupta@...rosoft.com>
Subject: RE: [PATCH v3 3/4] net: mana: Allow irq_setup() to skip cpus for
 affinity

From: Shradha Gupta <shradhagupta@...ux.microsoft.com> Sent: Friday, May 9, 2025 3:14 AM
> 
> In order to prepare the MANA driver to allocate the MSI-X IRQs
> dynamically, we need to prepare the irq_setup() to allow skipping

s/prepare the irq_setup()/enhance irq_setup()/

> affinitizing IRQs to first CPU sibling group.

s/to first/to the first/

> 
> This would be for cases when number of IRQs is less than or equal

s/when number/when the number/

> to number of online CPUs. In such cases for dynamically added IRQs

s/to number/to the number/

> the first CPU sibling group would already be affinitized with HWC IRQ

Add a period at the end of the sentence.

> 
> Signed-off-by: Shradha Gupta <shradhagupta@...ux.microsoft.com>
> Reviewed-by: Haiyang Zhang <haiyangz@...rosoft.com>
> ---
>  drivers/net/ethernet/microsoft/mana/gdma_main.c | 16 ++++++++++++++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> index 4ffaf7588885..2de42ce43373 100644
> --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> @@ -1288,7 +1288,8 @@ void mana_gd_free_res_map(struct gdma_resource *r)
>  	r->size = 0;
>  }
> 
> -static int irq_setup(unsigned int *irqs, unsigned int len, int node)
> +static int irq_setup(unsigned int *irqs, unsigned int len, int node,
> +		     bool skip_first_cpu)
>  {
>  	const struct cpumask *next, *prev = cpu_none_mask;
>  	cpumask_var_t cpus __free(free_cpumask_var);
> @@ -1303,9 +1304,20 @@ static int irq_setup(unsigned int *irqs, unsigned int len, int node)
>  		while (weight > 0) {
>  			cpumask_andnot(cpus, next, prev);
>  			for_each_cpu(cpu, cpus) {
> +				/*
> +				 * if the CPU sibling set is to be skipped we
> +				 * just move on to the next CPUs without len--
> +				 */
> +				if (unlikely(skip_first_cpu)) {
> +					skip_first_cpu = false;
> +					goto next_cpumask;
> +				}
> +
>  				if (len-- == 0)
>  					goto done;
> +
>  				irq_set_affinity_and_hint(*irqs++, topology_sibling_cpumask(cpu));
> +next_cpumask:
>  				cpumask_andnot(cpus, cpus, topology_sibling_cpumask(cpu));
>  				--weight;
>  			}

With a little bit of reordering of the code, you could avoid the need for the "next_cpumask"
label and goto statement.  "continue" is usually cleaner than a "goto". Here's what I'm thinking:

		for_each_cpu(cpu, cpus) {
			cpumask_andnot(cpus, cpus, topology_sibling_cpumask(cpu));
			--weight;

			If (unlikely(skip_first_cpu)) {
				skip_first_cpu = false;
				continue;
			}

			If (len-- == 0)
				goto done;

			irq_set_affinity_and_hint(*irqs++, topology_sibling_cpumask(cpu));
		}

I wish there were some comments in irq_setup() explaining the overall intention of
the algorithm. I can see how the goal is to first assign CPUs that are local to the current
NUMA node, and then expand outward to CPUs that are further away. And you want
to *not* assign both siblings in a hyper-threaded core. But I can't figure out what
"weight" is trying to accomplish. Maybe this was discussed when the code first
went in, but I can't remember now. :-(

Michael

> @@ -1403,7 +1415,7 @@ static int mana_gd_setup_irqs(struct pci_dev *pdev)
>  		}
>  	}
> 
> -	err = irq_setup(irqs, (nvec - start_irq_index), gc->numa_node);
> +	err = irq_setup(irqs, (nvec - start_irq_index), gc->numa_node, false);
>  	if (err)
>  		goto free_irq;
> 
> --
> 2.34.1
>