linux-kernel - RE: [PATCH net-next] net: mana: Assigning IRQ affinity on HT cores

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <PH7PR21MB311687F4F37C55D6B332D7F0CAB1A@PH7PR21MB3116.namprd21.prod.outlook.com>
Date:   Wed, 15 Nov 2023 22:54:44 +0000
From:   Haiyang Zhang <haiyangz@...rosoft.com>
To:     Souradeep Chakrabarti <schakrabarti@...ux.microsoft.com>,
        KY Srinivasan <kys@...rosoft.com>,
        "wei.liu@...nel.org" <wei.liu@...nel.org>,
        Dexuan Cui <decui@...rosoft.com>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "edumazet@...gle.com" <edumazet@...gle.com>,
        "kuba@...nel.org" <kuba@...nel.org>,
        "pabeni@...hat.com" <pabeni@...hat.com>,
        Long Li <longli@...rosoft.com>,
        "sharmaajay@...rosoft.com" <sharmaajay@...rosoft.com>,
        "leon@...nel.org" <leon@...nel.org>,
        "cai.huoqing@...ux.dev" <cai.huoqing@...ux.dev>,
        "ssengar@...ux.microsoft.com" <ssengar@...ux.microsoft.com>,
        "vkuznets@...hat.com" <vkuznets@...hat.com>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-rdma@...r.kernel.org" <linux-rdma@...r.kernel.org>
CC:     Souradeep Chakrabarti <schakrabarti@...rosoft.com>,
        Paul Rosswurm <paulros@...rosoft.com>
Subject: RE: [PATCH net-next] net: mana: Assigning IRQ affinity on HT cores



> -----Original Message-----
> From: Souradeep Chakrabarti <schakrabarti@...ux.microsoft.com>
> Sent: Wednesday, November 15, 2023 8:49 AM
> To: KY Srinivasan <kys@...rosoft.com>; Haiyang Zhang
> <haiyangz@...rosoft.com>; wei.liu@...nel.org; Dexuan Cui
> <decui@...rosoft.com>; davem@...emloft.net; edumazet@...gle.com;
> kuba@...nel.org; pabeni@...hat.com; Long Li <longli@...rosoft.com>;
> sharmaajay@...rosoft.com; leon@...nel.org; cai.huoqing@...ux.dev;
> ssengar@...ux.microsoft.com; vkuznets@...hat.com; tglx@...utronix.de;
> linux-hyperv@...r.kernel.org; netdev@...r.kernel.org; linux-
> kernel@...r.kernel.org; linux-rdma@...r.kernel.org
> Cc: Souradeep Chakrabarti <schakrabarti@...rosoft.com>; Paul Rosswurm
> <paulros@...rosoft.com>; Souradeep Chakrabarti
> <schakrabarti@...ux.microsoft.com>
> Subject: [PATCH net-next] net: mana: Assigning IRQ affinity on HT cores
> 
> Existing MANA design assigns IRQ affinity to every sibling CPUs, which causes
> IRQ coalescing and may reduce the network performance with RSS.
> 
> Improve the performance by adhering the configuration for RSS, which
> prioritise
> IRQ affinity on HT cores.
> 
> Signed-off-by: Souradeep Chakrabarti <schakrabarti@...ux.microsoft.com>
> ---
>  .../net/ethernet/microsoft/mana/gdma_main.c   | 126 ++++++++++++++++-
> -
>  1 file changed, 117 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> index 6367de0c2c2e..839be819d46e 100644
> --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> @@ -1243,13 +1243,115 @@ void mana_gd_free_res_map(struct
> gdma_resource *r)
>  	r->size = 0;
>  }
> 
> +static void cpu_mask_set(cpumask_var_t *filter_mask, cpumask_var_t
> **filter_mask_list)
> +{
> +	unsigned int core_count = 0, cpu;
> +	cpumask_var_t *filter_mask_list_tmp;
> +
> +	BUG_ON(!filter_mask || !filter_mask_list);
> +	filter_mask_list_tmp = *filter_mask_list;
> +	cpumask_copy(*filter_mask, cpu_online_mask);
> +	/* for each core create a cpumask lookup table,
> +	 * which stores all the corresponding siblings
> +	 */
> +	for_each_cpu(cpu, *filter_mask) {
> +
> 	BUG_ON(!alloc_cpumask_var(&filter_mask_list_tmp[core_count],
> GFP_KERNEL));
> +		cpumask_or(filter_mask_list_tmp[core_count],
> filter_mask_list_tmp[core_count],
> +			   topology_sibling_cpumask(cpu));
> +		cpumask_andnot(*filter_mask, *filter_mask,
> topology_sibling_cpumask(cpu));
> +		core_count++;
> +	}
> +}
> +
> +static int irq_setup(int *irqs, int nvec)
> +{
> +	cpumask_var_t filter_mask;
> +	cpumask_var_t *filter_mask_list;
> +	unsigned int cpu_first, cpu, irq_start, cores = 0;
> +	int i, core_count = 0, numa_node, cpu_count = 0, err = 0;
> +
> +	BUG_ON(!alloc_cpumask_var(&filter_mask, GFP_KERNEL));
> +	cpus_read_lock();
> +	cpumask_copy(filter_mask, cpu_online_mask);
> +	/* count the number of cores
> +	 */
> +	for_each_cpu(cpu, filter_mask) {
> +		cpumask_andnot(filter_mask, filter_mask,
> topology_sibling_cpumask(cpu));
> +		cores++;
> +	}
> +	filter_mask_list = kcalloc(cores, sizeof(cpumask_var_t), GFP_KERNEL);
> +	if (!filter_mask_list) {
> +		err = -ENOMEM;
> +		goto free_irq;
> +	}
> +	/* if number of cpus are equal to max_queues per port, then
> +	 * one extra interrupt for the hardware channel communication.
> +	 */
> +	if (nvec - 1 == num_online_cpus()) {
> +		irq_start = 1;
> +		cpu_first = cpumask_first(cpu_online_mask);
> +		irq_set_affinity_and_hint(irqs[0], cpumask_of(cpu_first));
> +	} else {
> +		irq_start = 0;
> +	}
> +	/* reset the core_count and num_node to 0.
> +	 */
> +	core_count = 0;
> +	numa_node = 0;

Please start with gc->numa_node here. I know it's 0 for now. But the host 
will provide real numa node# close to the device in the future.

Also, as we discussed, consider using the NUMA distance to select the next
numa node (in a separate patch).

> +	cpu_mask_set(&filter_mask, &filter_mask_list);
> +	/* for each interrupt find the cpu of a particular
> +	 * sibling set and if it belongs to the specific numa
> +	 * then assign irq to it and clear the cpu bit from
> +	 * the corresponding sibling list from filter_mask_list.
> +	 * Increase the cpu_count for that node.
> +	 * Once all cpus for a numa node is assigned, then
> +	 * move to different numa node and continue the same.
> +	 */
> +	for (i = irq_start; i < nvec; ) {
> +		cpu_first = cpumask_first(filter_mask_list[core_count]);
> +		if (cpu_first < nr_cpu_ids && cpu_to_node(cpu_first) ==
> numa_node) {
> +			irq_set_affinity_and_hint(irqs[i],
> cpumask_of(cpu_first));
> +			cpumask_clear_cpu(cpu_first,
> filter_mask_list[core_count]);
> +			cpu_count = cpu_count + 1;
> +			i = i + 1;
> +			/* checking if all the cpus are used from the
> +			 * particular node.
> +			 */
> +			if (cpu_count == nr_cpus_node(numa_node)) {
> +				numa_node = numa_node + 1;
> +				if (numa_node == num_online_nodes()) {
> +					cpu_mask_set(&filter_mask,
> &filter_mask_list);
> +					numa_node = 0;
Ditto.

Other things look good to me.

Thanks,
- Haiyang