lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 14 Mar 2024 18:54:31 +0000
From: Haiyang Zhang <haiyangz@...rosoft.com>
To: Jakub Kicinski <kuba@...nel.org>, Shradha Gupta
	<shradhagupta@...ux.microsoft.com>
CC: Shradha Gupta <shradhagupta@...rosoft.com>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "linux-hyperv@...r.kernel.org"
	<linux-hyperv@...r.kernel.org>, "linux-rdma@...r.kernel.org"
	<linux-rdma@...r.kernel.org>, "netdev@...r.kernel.org"
	<netdev@...r.kernel.org>, Eric Dumazet <edumazet@...gle.com>, Paolo Abeni
	<pabeni@...hat.com>, Ajay Sharma <sharmaajay@...rosoft.com>, Leon Romanovsky
	<leon@...nel.org>, Thomas Gleixner <tglx@...utronix.de>, Sebastian Andrzej
 Siewior <bigeasy@...utronix.de>, KY Srinivasan <kys@...rosoft.com>, Wei Liu
	<wei.liu@...nel.org>, Dexuan Cui <decui@...rosoft.com>, Long Li
	<longli@...rosoft.com>, Michael Kelley <mikelley@...rosoft.com>, Alireza
 Dabagh <alid@...rosoft.com>, Paul Rosswurm <paulros@...rosoft.com>
Subject: RE: [PATCH] net :mana : Add per-cpu stats for MANA device



> -----Original Message-----
> From: Jakub Kicinski <kuba@...nel.org>
> Sent: Thursday, March 14, 2024 2:28 PM
> To: Shradha Gupta <shradhagupta@...ux.microsoft.com>
> Cc: Haiyang Zhang <haiyangz@...rosoft.com>; Shradha Gupta
> <shradhagupta@...rosoft.com>; linux-kernel@...r.kernel.org; linux-
> hyperv@...r.kernel.org; linux-rdma@...r.kernel.org;
> netdev@...r.kernel.org; Eric Dumazet <edumazet@...gle.com>; Paolo Abeni
> <pabeni@...hat.com>; Ajay Sharma <sharmaajay@...rosoft.com>; Leon
> Romanovsky <leon@...nel.org>; Thomas Gleixner <tglx@...utronix.de>;
> Sebastian Andrzej Siewior <bigeasy@...utronix.de>; KY Srinivasan
> <kys@...rosoft.com>; Wei Liu <wei.liu@...nel.org>; Dexuan Cui
> <decui@...rosoft.com>; Long Li <longli@...rosoft.com>; Michael Kelley
> <mikelley@...rosoft.com>
> Subject: Re: [PATCH] net :mana : Add per-cpu stats for MANA device
> 
> On Wed, 13 Mar 2024 19:57:20 -0700 Shradha Gupta wrote:
> > Default interrupts affinity for each queue:
> >
> >  25:          1        103          0    2989138  Hyper-V PCIe MSI
> 4138200989697-edge      mana_q0@pci:7870:00:00.0
> >  26:          0          1    4005360          0  Hyper-V PCIe MSI
> 4138200989698-edge      mana_q1@pci:7870:00:00.0
> >  27:          0          0          1    2997584  Hyper-V PCIe MSI
> 4138200989699-edge      mana_q2@pci:7870:00:00.0
> >  28:    3565461          0          0          1  Hyper-V PCIe MSI
> 4138200989700-edge      mana_q3
> > @pci:7870:00:00.0
> >
> > As seen the CPU-queue mapping is not 1:1, Queue 0 and Queue 2 are both
> mapped
> > to cpu3. From this knowledge we can figure out the total RX stats
> processed by
> > each CPU by adding the values of mana_q0 and mana_q2 stats for cpu3.
> But if
> > this data changes dynamically using irqbalance or smp_affinity file
> edits, the
> > above assumption fails.
> >
> > Interrupt affinity for mana_q2 changes and the affinity table looks as
> follows
> >  25:          1        103          0    3038084  Hyper-V PCIe MSI
> 4138200989697-edge      mana_q0@pci:7870:00:00.0
> >  26:          0          1    4012447          0  Hyper-V PCIe MSI
> 4138200989698-edge      mana_q1@pci:7870:00:00.0
> >  27:     157181         10          1    3007990  Hyper-V PCIe MSI
> 4138200989699-edge      mana_q2@pci:7870:00:00.0
> >  28:    3593858          0          0          1  Hyper-V PCIe MSI
> 4138200989700-edge      mana_q3@pci:7870:00:00.0
> >
> > And during this time we might end up calculating the per-CPU stats
> incorrectly,
> > messing up the understanding of CPU usage by MANA driver that is
> consumed by
> > monitoring services.
> 
> Like Stephen said, forget about irqbalance for networking.
> 
> Assume that the IRQs are affinitized and XPS set, correctly.
> 
> Now, presumably you can use your pcpu stats to "trade queues",
> e.g. 4 CPUs / 4 queues, if CPU 0 insists on using queue 1
> instead of queue 0, you can swap the 0 <> 1 assignment.
> That's just an example of an "algorithm", maybe you have other
> use cases. But if the problem is "user runs broken irqbalance"
> the solution is not in the kernel...

We understand irqbalance may be a "bad idea", and recommended some 
customers to disable it when having problems with it... But it's 
still enabled by default, and we cannot let all distro vendors 
and custom image makers to disable the irqbalance. So, our host-
networking team is eager to have per-CPU stats for analyzing CPU 
usage related to irqbalance or other issues.

Thanks,
- Haiyang


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ