netdev - RE: [PATCH 3/4 net-next] net: mana: add a function to spread IRQs per CPUs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID:
 <SN6PR02MB4157372CF70059E8E35D5545D46E2@SN6PR02MB4157.namprd02.prod.outlook.com>
Date: Sat, 13 Jan 2024 16:20:31 +0000
From: Michael Kelley <mhklinux@...look.com>
To: Souradeep Chakrabarti <schakrabarti@...ux.microsoft.com>, Haiyang Zhang
	<haiyangz@...rosoft.com>
CC: Yury Norov <yury.norov@...il.com>, KY Srinivasan <kys@...rosoft.com>,
	"wei.liu@...nel.org" <wei.liu@...nel.org>, Dexuan Cui <decui@...rosoft.com>,
	"davem@...emloft.net" <davem@...emloft.net>, "edumazet@...gle.com"
	<edumazet@...gle.com>, "kuba@...nel.org" <kuba@...nel.org>,
	"pabeni@...hat.com" <pabeni@...hat.com>, Long Li <longli@...rosoft.com>,
	"leon@...nel.org" <leon@...nel.org>, "cai.huoqing@...ux.dev"
	<cai.huoqing@...ux.dev>, "ssengar@...ux.microsoft.com"
	<ssengar@...ux.microsoft.com>, "vkuznets@...hat.com" <vkuznets@...hat.com>,
	"tglx@...utronix.de" <tglx@...utronix.de>, "linux-hyperv@...r.kernel.org"
	<linux-hyperv@...r.kernel.org>, "netdev@...r.kernel.org"
	<netdev@...r.kernel.org>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "linux-rdma@...r.kernel.org"
	<linux-rdma@...r.kernel.org>, Souradeep Chakrabarti
	<schakrabarti@...rosoft.com>, Paul Rosswurm <paulros@...rosoft.com>
Subject: RE: [PATCH 3/4 net-next] net: mana: add a function to spread IRQs per
 CPUs

From: Souradeep Chakrabarti <schakrabarti@...ux.microsoft.com> Sent: Friday, January 12, 2024 10:31 PM

> On Fri, Jan 12, 2024 at 06:30:44PM +0000, Haiyang Zhang wrote:
> >
> > > -----Original Message-----
> > From: Michael Kelley <mhklinux@...look.com> Sent: Friday, January 12, 2024 11:37 AM
> > >
> > > From: Souradeep Chakrabarti <schakrabarti@...ux.microsoft.com> Sent:
> > > Wednesday, January 10, 2024 10:13 PM
> > > >
> > > > The test topology was used to check the performance between
> > > > cpu_local_spread() and the new approach is :
> > > > Case 1
> > > > IRQ     Nodes  Cores CPUs
> > > > 0       1      0     0-1
> > > > 1       1      1     2-3
> > > > 2       1      2     4-5
> > > > 3       1      3     6-7
> > > >
> > > > and with existing cpu_local_spread()
> > > > Case 2
> > > > IRQ    Nodes  Cores CPUs
> > > > 0      1      0     0
> > > > 1      1      0     1
> > > > 2      1      1     2
> > > > 3      1      1     3
> > > >
> > > > Total 4 channels were used, which was set up by ethtool.
> > > > case 1 with ntttcp has given 15 percent better performance, than
> > > > case 2. During the test irqbalance was disabled as well.
> > > >
> > > > Also you are right, with 64CPU system this approach will spread
> > > > the irqs like the cpu_local_spread() but in the future we will offer
> > > > MANA nodes, with more than 64 CPUs. There it this new design will
> > > > give better performance.
> > > >
> > > > I will add this performance benefit details in commit message of
> > > > next version.
> > >
> > > Here are my concerns:
> > >
> > > 1.  The most commonly used VMs these days have 64 or fewer
> > > vCPUs and won't see any performance benefit.
> > >
> > > 2.  Larger VMs probably won't see the full 15% benefit because
> > > all vCPUs in the local NUMA node will be assigned IRQs.  For
> > > example, in a VM with 96 vCPUs and 2 NUMA nodes, all 48
> > > vCPUs in NUMA node 0 will all be assigned IRQs.  The remaining
> > > 16 IRQs will be spread out on the 48 CPUs in NUMA node 1
> > > in a way that avoids sharing a core.  But overall the means
> > > that 75% of the IRQs will still be sharing a core and
> > > presumably not see any perf benefit.
> > >
> > > 3.  Your experiment was on a relatively small scale:   4 IRQs
> > > spread across 2 cores vs. across 4 cores.  Have you run any
> > > experiments on VMs with 128 vCPUs (for example) where
> > > most of the IRQs are not sharing a core?  I'm wondering if
> > > the results with 4 IRQs really scale up to 64 IRQs.  A lot can
> > > be different in a VM with 64 cores and 2 NUMA nodes vs.
> > > 4 cores in a single node.
> > >
> > > 4.  The new algorithm prefers assigning to all vCPUs in
> > > each NUMA hop over assigning to separate cores.  Are there
> > > experiments showing that is the right tradeoff?  What
> > > are the results if assigning to separate cores is preferred?
> >
> > I remember in a customer case, putting the IRQs on the same
> > NUMA node has better perf. But I agree, this should be re-tested
> > on MANA nic.
>
> 1) and 2) The change will not decrease the existing performance, but for
> system with high number of CPU, will be benefited after this.
> 
> 3) The result has shown around 6 percent improvement.
> 
> 4)The test result has shown around 10 percent difference when IRQs are
> spread on multiple numa nodes.

OK, this looks pretty good.  Make clear in the commit messages what
the tradeoffs are, and what the real-world benefits are expected to be.
Some future developer who wants to understand why IRQs are assigned
this way will thank you. :-)

Michael