lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20241107164213.GA189042@unreal>
Date: Thu, 7 Nov 2024 18:42:13 +0200
From: Leon Romanovsky <leon@...nel.org>
To: Jakub Kicinski <kuba@...nel.org>
Cc: Vadim Fedorenko <vadim.fedorenko@...ux.dev>,
	Andrew Lunn <andrew@...n.ch>, Bjorn Helgaas <helgaas@...nel.org>,
	Sanman Pradhan <sanman.p211993@...il.com>,
	Bjorn Helgaas <bhelgaas@...gle.com>, netdev@...r.kernel.org,
	alexanderduyck@...com, kernel-team@...a.com, davem@...emloft.net,
	edumazet@...gle.com, pabeni@...hat.com, horms@...nel.org,
	corbet@....net, mohsin.bashr@...il.com, sanmanpradhan@...a.com,
	andrew+netdev@...n.ch, jdamato@...tly.com, sdf@...ichev.me,
	linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
	linux-pci@...r.kernel.org
Subject: Re: [PATCH net-next] eth: fbnic: Add PCIe hardware statistics

On Thu, Nov 07, 2024 at 07:40:09AM -0800, Jakub Kicinski wrote:
> On Thu, 7 Nov 2024 14:03:57 +0200 Leon Romanovsky wrote:
> > > [root@...t ~]# ethtool -i eth0 | grep driver
> > > driver: mlx5_core
> > > [root@...t ~]# ethtool -S eth0 | grep pci
> > >      rx_pci_signal_integrity: 1
> > >      tx_pci_signal_integrity: 1471
> > >      outbound_pci_stalled_rd: 0
> > >      outbound_pci_stalled_wr: 0
> > >      outbound_pci_stalled_rd_events: 0
> > >      outbound_pci_stalled_wr_events: 0
> > > 
> > > Isn't it a PCIe statistics?  
> > 
> > I didn't do full archaeological research and stopped at 2017 there these
> > counters were updated to use new API, but it looks like they there from
> > stone age.
> > 
> > It was a mistake to put it there and they should be moved to PCI core
> > together with other hundreds debug counters which ConnectX devices have
> > but don't expose yet.
> 
> Whatever hand-waving you do now, it's impossible to take you seriously
> where the device driver of which you are a maintainer does the same
> thing. 

I said that it is a mistake and can add that we can move it to new infrastructure.

> And your direction going forward for PCIe debug, AFAIU, is the
> proprietary fwctl stuff. Please stop.

Nice, and we are returning back to the discussion of evil vendors vs.
good people who are working in cloud companies which produce hardware
for themselves but don't call themselves vendors.

The latter can do whatever they want, but vendors are doing only crap.

The patch author added these debug counters, and magically it is fine for you:
+   These counters indicate PCIe resource exhaustion events:
+        - pcie_ob_rd_no_tag: Read requests dropped due to tag unavailability
+        - pcie_ob_rd_no_cpl_cred: Read requests dropped due to completion credit exhaustion
+        - pcie_ob_rd_no_np_cred: Read requests dropped due to non-posted credit exhaustion

For example, mlx5 devices and Broadcom have two simple PCIe counters: rx_errors and tx_errors,
which have nothing to do with fwctl.

And the idea, what you can take mistakes from the past, ignore the
feedback and repeat these mistakes, fills me with amazement.

So why don't you allow module parameters? Many drivers have them, but
new are not allowed. If I claim that "vendor XXX has it, can I add it
too?", we all know the answer.

Thanks

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ