netdev - RE: [EXTERNAL] [PATCH 3/3] net: mana: add a function to spread IRQs per CPUs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
 <PUZP153MB07886CE88351F6B7A2AA0096CC97A@PUZP153MB0788.APCP153.PROD.OUTLOOK.COM>
Date: Tue, 19 Dec 2023 10:18:49 +0000
From: Souradeep Chakrabarti <schakrabarti@...rosoft.com>
To: Yury Norov <yury.norov@...il.com>, Souradeep Chakrabarti
	<schakrabarti@...ux.microsoft.com>, KY Srinivasan <kys@...rosoft.com>,
	Haiyang Zhang <haiyangz@...rosoft.com>, "wei.liu@...nel.org"
	<wei.liu@...nel.org>, Dexuan Cui <decui@...rosoft.com>, "davem@...emloft.net"
	<davem@...emloft.net>, "edumazet@...gle.com" <edumazet@...gle.com>,
	"kuba@...nel.org" <kuba@...nel.org>, "pabeni@...hat.com" <pabeni@...hat.com>,
	Long Li <longli@...rosoft.com>, "leon@...nel.org" <leon@...nel.org>,
	"cai.huoqing@...ux.dev" <cai.huoqing@...ux.dev>,
	"ssengar@...ux.microsoft.com" <ssengar@...ux.microsoft.com>,
	"vkuznets@...hat.com" <vkuznets@...hat.com>, "tglx@...utronix.de"
	<tglx@...utronix.de>, "linux-hyperv@...r.kernel.org"
	<linux-hyperv@...r.kernel.org>, "netdev@...r.kernel.org"
	<netdev@...r.kernel.org>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "linux-rdma@...r.kernel.org"
	<linux-rdma@...r.kernel.org>
CC: Paul Rosswurm <paulros@...rosoft.com>
Subject: RE: [EXTERNAL] [PATCH 3/3] net: mana: add a function to spread IRQs
 per CPUs



>-----Original Message-----
>From: Yury Norov <yury.norov@...il.com>
>Sent: Monday, December 18, 2023 3:02 AM
>To: Souradeep Chakrabarti <schakrabarti@...ux.microsoft.com>; KY Srinivasan
><kys@...rosoft.com>; Haiyang Zhang <haiyangz@...rosoft.com>;
>wei.liu@...nel.org; Dexuan Cui <decui@...rosoft.com>; davem@...emloft.net;
>edumazet@...gle.com; kuba@...nel.org; pabeni@...hat.com; Long Li
><longli@...rosoft.com>; yury.norov@...il.com; leon@...nel.org;
>cai.huoqing@...ux.dev; ssengar@...ux.microsoft.com; vkuznets@...hat.com;
>tglx@...utronix.de; linux-hyperv@...r.kernel.org; netdev@...r.kernel.org; linux-
>kernel@...r.kernel.org; linux-rdma@...r.kernel.org
>Cc: Souradeep Chakrabarti <schakrabarti@...rosoft.com>; Paul Rosswurm
><paulros@...rosoft.com>
>Subject: [EXTERNAL] [PATCH 3/3] net: mana: add a function to spread IRQs per
>CPUs
>
>[Some people who received this message don't often get email from
>yury.norov@...il.com. Learn why this is important at
>https://aka.ms/LearnAboutSenderIdentification ]
>
>Souradeep investigated that the driver performs faster if IRQs are spread on CPUs
>with the following heuristics:
>
>1. No more than one IRQ per CPU, if possible; 2. NUMA locality is the second
>priority; 3. Sibling dislocality is the last priority.
>
>Let's consider this topology:
>
>Node            0               1
>Core        0       1       2       3
>CPU       0   1   2   3   4   5   6   7
>
>The most performant IRQ distribution based on the above topology and heuristics
>may look like this:
>
>IRQ     Nodes   Cores   CPUs
>0       1       0       0-1
>1       1       1       2-3
>2       1       0       0-1
>3       1       1       2-3
>4       2       2       4-5
>5       2       3       6-7
>6       2       2       4-5
>7       2       3       6-7
>
>The irq_setup() routine introduced in this patch leverages the
>for_each_numa_hop_mask() iterator and assigns IRQs to sibling groups as
>described above.
>
>According to [1], for NUMA-aware but sibling-ignorant IRQ distribution based on
>cpumask_local_spread() performance test results look like this:
>
>./ntttcp -r -m 16
>NTTTCP for Linux 1.4.0
>---------------------------------------------------------
>08:05:20 INFO: 17 threads created
>08:05:28 INFO: Network activity progressing...
>08:06:28 INFO: Test run completed.
>08:06:28 INFO: Test cycle finished.
>08:06:28 INFO: #####  Totals:  #####
>08:06:28 INFO: test duration    :60.00 seconds
>08:06:28 INFO: total bytes      :630292053310
>08:06:28 INFO:   throughput     :84.04Gbps
>08:06:28 INFO:   retrans segs   :4
>08:06:28 INFO: cpu cores        :192
>08:06:28 INFO:   cpu speed      :3799.725MHz
>08:06:28 INFO:   user           :0.05%
>08:06:28 INFO:   system         :1.60%
>08:06:28 INFO:   idle           :96.41%
>08:06:28 INFO:   iowait         :0.00%
>08:06:28 INFO:   softirq        :1.94%
>08:06:28 INFO:   cycles/byte    :2.50
>08:06:28 INFO: cpu busy (all)   :534.41%
>
>For NUMA- and sibling-aware IRQ distribution, the same test works 15% faster:
>
>./ntttcp -r -m 16
>NTTTCP for Linux 1.4.0
>---------------------------------------------------------
>08:08:51 INFO: 17 threads created
>08:08:56 INFO: Network activity progressing...
>08:09:56 INFO: Test run completed.
>08:09:56 INFO: Test cycle finished.
>08:09:56 INFO: #####  Totals:  #####
>08:09:56 INFO: test duration    :60.00 seconds
>08:09:56 INFO: total bytes      :741966608384
>08:09:56 INFO:   throughput     :98.93Gbps
>08:09:56 INFO:   retrans segs   :6
>08:09:56 INFO: cpu cores        :192
>08:09:56 INFO:   cpu speed      :3799.791MHz
>08:09:56 INFO:   user           :0.06%
>08:09:56 INFO:   system         :1.81%
>08:09:56 INFO:   idle           :96.18%
>08:09:56 INFO:   iowait         :0.00%
>08:09:56 INFO:   softirq        :1.95%
>08:09:56 INFO:   cycles/byte    :2.25
>08:09:56 INFO: cpu busy (all)   :569.22%
>
>[1]
>https://lore.kernel/
>.org%2Fall%2F20231211063726.GA4977%40linuxonhyperv3.guj3yctzbm1etfxqx2v
>ob5hsef.xx.internal.cloudapp.net%2F&data=05%7C02%7Cschakrabarti%40micros
>oft.com%7Ca385a5a5d661458219c208dbff47a7ab%7C72f988bf86f141af91ab2d7
>cd011db47%7C1%7C0%7C638384455520036393%7CUnknown%7CTWFpbGZsb3d
>8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%
>7C3000%7C%7C%7C&sdata=kzoalzSu6frB0GIaUM5VWsz04%2FsB%2FBdXwXKb26
>IhqkE%3D&reserved=0
>
>Signed-off-by: Yury Norov <yury.norov@...il.com>
>Co-developed-by: Souradeep Chakrabarti <schakrabarti@...ux.microsoft.com>
>---
> .../net/ethernet/microsoft/mana/gdma_main.c   | 28 +++++++++++++++++++
> 1 file changed, 28 insertions(+)
>
>diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c
>b/drivers/net/ethernet/microsoft/mana/gdma_main.c
>index 6367de0c2c2e..11e64e42e3b2 100644
>--- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
>+++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
>@@ -1243,6 +1243,34 @@ void mana_gd_free_res_map(struct gdma_resource
>*r)
>        r->size = 0;
> }
>
>+static __maybe_unused int irq_setup(unsigned int *irqs, unsigned int
>+len, int node) {
>+       const struct cpumask *next, *prev = cpu_none_mask;
>+       cpumask_var_t cpus __free(free_cpumask_var);
>+       int cpu, weight;
>+
>+       if (!alloc_cpumask_var(&cpus, GFP_KERNEL))
>+               return -ENOMEM;
>+
>+       rcu_read_lock();
>+       for_each_numa_hop_mask(next, node) {
>+               weight = cpumask_weight_andnot(next, prev);
>+               while (weight-- > 0) {
Make it while (weight > 0) {
>+                       cpumask_andnot(cpus, next, prev);
>+                       for_each_cpu(cpu, cpus) {
>+                               if (len-- == 0)
>+                                       goto done;
>+                               irq_set_affinity_and_hint(*irqs++,
>topology_sibling_cpumask(cpu));
>+                               cpumask_andnot(cpus, cpus, topology_sibling_cpumask(cpu));
Here do --weight, else this code will traverse the same node N^2 times, where each
node has N cpus .
>+                       }
>+               }
>+               prev = next;
>+       }
>+done:
>+       rcu_read_unlock();
>+       return 0;
>+}
>+
> static int mana_gd_setup_irqs(struct pci_dev *pdev)  {
>        unsigned int max_queues_per_port = num_online_cpus();
>--
>2.40.1