lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240314025720.GA13853@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net>
Date: Wed, 13 Mar 2024 19:57:20 -0700
From: Shradha Gupta <shradhagupta@...ux.microsoft.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: Haiyang Zhang <haiyangz@...rosoft.com>,
	Shradha Gupta <shradhagupta@...rosoft.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
	"linux-rdma@...r.kernel.org" <linux-rdma@...r.kernel.org>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	Eric Dumazet <edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>,
	Ajay Sharma <sharmaajay@...rosoft.com>,
	Leon Romanovsky <leon@...nel.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
	KY Srinivasan <kys@...rosoft.com>, Wei Liu <wei.liu@...nel.org>,
	Dexuan Cui <decui@...rosoft.com>, Long Li <longli@...rosoft.com>,
	Michael Kelley <mikelley@...rosoft.com>
Subject: Re: [PATCH] net :mana : Add per-cpu stats for MANA device

On Sun, Mar 10, 2024 at 09:19:50PM -0700, Shradha Gupta wrote:
> On Fri, Mar 08, 2024 at 11:22:44AM -0800, Jakub Kicinski wrote:
> > On Fri, 8 Mar 2024 18:51:58 +0000 Haiyang Zhang wrote:
> > > > Dynamic is a bit of an exaggeration, right? On a well-configured system
> > > > each CPU should use a single queue assigned thru XPS. And for manual
> > > > debug bpftrace should serve the purpose quite well.  
> > > 
> > > Some programs, like irqbalancer can dynamically change the CPU affinity, 
> > > so we want to add the per-CPU counters for better understanding of the CPU 
> > > usage.
> > 
> > Do you have experimental data showing this making a difference
> > in production?
> Sure, will try to get that data for this discussion
> > 
> > Seems unlikely, but if it does work we should enable it for all
> > devices, no driver by driver.
> You mean, if the usecase seems valid we should try to extend the framework
> mentioned by Rahul (https://lore.kernel.org/lkml/20240307072923.6cc8a2ba@kernel.org/)
> to include these stats as well?
> Will explore this a bit more and update. Thanks.

Following is the data we can share:

Default interrupts affinity for each queue:

 25:          1        103          0    2989138  Hyper-V PCIe MSI 4138200989697-edge      mana_q0@pci:7870:00:00.0
 26:          0          1    4005360          0  Hyper-V PCIe MSI 4138200989698-edge      mana_q1@pci:7870:00:00.0
 27:          0          0          1    2997584  Hyper-V PCIe MSI 4138200989699-edge      mana_q2@pci:7870:00:00.0
 28:    3565461          0          0          1  Hyper-V PCIe MSI 4138200989700-edge      mana_q3
@pci:7870:00:00.0

As seen the CPU-queue mapping is not 1:1, Queue 0 and Queue 2 are both mapped 
to cpu3. From this knowledge we can figure out the total RX stats processed by
each CPU by adding the values of mana_q0 and mana_q2 stats for cpu3. But if
this data changes dynamically using irqbalance or smp_affinity file edits, the
above assumption fails. 

Interrupt affinity for mana_q2 changes and the affinity table looks as follows
 25:          1        103          0    3038084  Hyper-V PCIe MSI 4138200989697-edge      mana_q0@pci:7870:00:00.0
 26:          0          1    4012447          0  Hyper-V PCIe MSI 4138200989698-edge      mana_q1@pci:7870:00:00.0
 27:     157181         10          1    3007990  Hyper-V PCIe MSI 4138200989699-edge      mana_q2@pci:7870:00:00.0
 28:    3593858          0          0          1  Hyper-V PCIe MSI 4138200989700-edge      mana_q3@pci:7870:00:00.0 

And during this time we might end up calculating the per-CPU stats incorrectly,
messing up the understanding of CPU usage by MANA driver that is consumed by 
monitoring services. 
 

Also sharing the existing per-queue stats during this experiment, in case needed

Per-queue stats before changing CPU-affinities:
     tx_cq_err: 0
     tx_cqe_unknown_type: 0
     rx_coalesced_err: 0
     rx_cqe_unknown_type: 0
     rx_0_packets: 4230152
     rx_0_bytes: 289545167
     rx_0_xdp_drop: 0
     rx_0_xdp_tx: 0
     rx_0_xdp_redirect: 0
     rx_1_packets: 4113017
     rx_1_bytes: 314552601
     rx_1_xdp_drop: 0
     rx_1_xdp_tx: 0
     rx_1_xdp_redirect: 0
     rx_2_packets: 4458906
     rx_2_bytes: 305117506
     rx_2_xdp_drop: 0
     rx_2_xdp_tx: 0
     rx_2_xdp_redirect: 0
     rx_3_packets: 4619589
     rx_3_bytes: 315445084
     rx_3_xdp_drop: 0
     rx_3_xdp_tx: 0
     rx_3_xdp_redirect: 0
     hc_tx_err_vport_disabled: 0
     hc_tx_err_inval_vportoffset_pkt: 0
     hc_tx_err_vlan_enforcement: 0
     hc_tx_err_eth_type_enforcement: 0
     hc_tx_err_sa_enforcement: 0
     hc_tx_err_sqpdid_enforcement: 0
     hc_tx_err_cqpdid_enforcement: 0
     hc_tx_err_mtu_violation: 0
     hc_tx_err_inval_oob: 0
     hc_tx_err_gdma: 0
     hc_tx_bytes: 126336708121
     hc_tx_ucast_pkts: 86748013
     hc_tx_ucast_bytes: 126336703775
     hc_tx_bcast_pkts: 37
     hc_tx_bcast_bytes: 2842
     hc_tx_mcast_pkts: 7
     hc_tx_mcast_bytes: 1504
     tx_cq_err: 0
     tx_cqe_unknown_type: 0
     rx_coalesced_err: 0
     rx_cqe_unknown_type: 0
     rx_0_packets: 4230152
     rx_0_bytes: 289545167
     rx_0_xdp_drop: 0
     rx_0_xdp_tx: 0
     rx_0_xdp_redirect: 0
     rx_1_packets: 4113017
     rx_1_bytes: 314552601
     rx_1_xdp_drop: 0
     rx_1_xdp_tx: 0
     rx_1_xdp_redirect: 0
     rx_2_packets: 4458906
     rx_2_bytes: 305117506
     rx_2_xdp_drop: 0
     rx_2_xdp_tx: 0
     rx_2_xdp_redirect: 0
     rx_3_packets: 4619589
     rx_3_bytes: 315445084
     rx_3_xdp_drop: 0
     rx_3_xdp_tx: 0
     rx_3_xdp_redirect: 0
     tx_0_packets: 5995507
     tx_0_bytes: 28749696408
     tx_0_xdp_xmit: 0
     tx_0_tso_packets: 4719840
     tx_0_tso_bytes: 26873844525
     tx_0_tso_inner_packets: 0
     tx_0_tso_inner_bytes: 0
     tx_0_long_pkt_fmt: 0
     tx_0_short_pkt_fmt: 5995507
     tx_0_csum_partial: 1275621
     tx_0_mana_map_err: 0
     tx_1_packets: 6653598
     tx_1_bytes: 38318341475
     tx_1_xdp_xmit: 0
     tx_1_tso_packets: 5330921
     tx_1_tso_bytes: 36210150488
     tx_1_tso_inner_packets: 0
     tx_1_tso_inner_bytes: 0
     tx_1_long_pkt_fmt: 0
     tx_1_short_pkt_fmt: 6653598
     tx_1_csum_partial: 1322643
     tx_1_mana_map_err: 0
     tx_2_packets: 5715246
     tx_2_bytes: 25662283686
     tx_2_xdp_xmit: 0
     tx_2_tso_packets: 4619118
     tx_2_tso_bytes: 23829680267
     tx_2_tso_inner_packets: 0
     tx_2_tso_inner_bytes: 0
     tx_2_long_pkt_fmt: 0
     tx_2_short_pkt_fmt: 5715246
     tx_2_csum_partial: 1096092
     tx_2_mana_map_err: 0
     tx_3_packets: 6175860
     tx_3_bytes: 29500667904
     tx_3_xdp_xmit: 0
     tx_3_tso_packets: 4951591
     tx_3_tso_bytes: 27446937448
     tx_3_tso_inner_packets: 0
     tx_3_tso_inner_bytes: 0
     tx_3_long_pkt_fmt: 0
     tx_3_short_pkt_fmt: 6175860
     tx_3_csum_partial: 1224213
     tx_3_mana_map_err: 0

Per-queue stats after changing CPU-affinities:
     rx_0_packets: 4781895
     rx_0_bytes: 326478061
     rx_0_xdp_drop: 0
     rx_0_xdp_tx: 0
     rx_0_xdp_redirect: 0
     rx_1_packets: 4116990
     rx_1_bytes: 315439234
     rx_1_xdp_drop: 0
     rx_1_xdp_tx: 0
     rx_1_xdp_redirect: 0
     rx_2_packets: 4528800
     rx_2_bytes: 310312337
     rx_2_xdp_drop: 0
     rx_2_xdp_tx: 0
     rx_2_xdp_redirect: 0
     rx_3_packets: 4622622
     rx_3_bytes: 316282431
     rx_3_xdp_drop: 0
     rx_3_xdp_tx: 0
     rx_3_xdp_redirect: 0
     tx_0_packets: 5999379
     tx_0_bytes: 28750864476
     tx_0_xdp_xmit: 0
     tx_0_tso_packets: 4720027
     tx_0_tso_bytes: 26874344494
     tx_0_tso_inner_packets: 0
     tx_0_tso_inner_bytes: 0
     tx_0_long_pkt_fmt: 0
     tx_0_short_pkt_fmt: 5999379
     tx_0_csum_partial: 1279296
     tx_0_mana_map_err: 0
     tx_1_packets: 6656913
     tx_1_bytes: 38319355168
     tx_1_xdp_xmit: 0
     tx_1_tso_packets: 5331086
     tx_1_tso_bytes: 36210592040
     tx_1_tso_inner_packets: 0
     tx_1_tso_inner_bytes: 0
     tx_1_long_pkt_fmt: 0
     tx_1_short_pkt_fmt: 6656913
     tx_1_csum_partial: 1325785
     tx_1_mana_map_err: 0
     tx_2_packets: 5906172
     tx_2_bytes: 36758032245
     tx_2_xdp_xmit: 0
     tx_2_tso_packets: 4806348
     tx_2_tso_bytes: 34912213258
     tx_2_tso_inner_packets: 0
     tx_2_tso_inner_bytes: 0
     tx_2_long_pkt_fmt: 0
     tx_2_short_pkt_fmt: 5906172
     tx_2_csum_partial: 1099782
     tx_2_mana_map_err: 0
     tx_3_packets: 6202399
     tx_3_bytes: 30840325531
     tx_3_xdp_xmit: 0
     tx_3_tso_packets: 4973730
     tx_3_tso_bytes: 28784371532
     tx_3_tso_inner_packets: 0
     tx_3_tso_inner_bytes: 0
     tx_3_long_pkt_fmt: 0
     tx_3_short_pkt_fmt: 6202399
     tx_3_csum_partial: 1228603
     tx_3_mana_map_err: 0


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ