[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20241107164213.GA189042@unreal>
Date: Thu, 7 Nov 2024 18:42:13 +0200
From: Leon Romanovsky <leon@...nel.org>
To: Jakub Kicinski <kuba@...nel.org>
Cc: Vadim Fedorenko <vadim.fedorenko@...ux.dev>,
Andrew Lunn <andrew@...n.ch>, Bjorn Helgaas <helgaas@...nel.org>,
Sanman Pradhan <sanman.p211993@...il.com>,
Bjorn Helgaas <bhelgaas@...gle.com>, netdev@...r.kernel.org,
alexanderduyck@...com, kernel-team@...a.com, davem@...emloft.net,
edumazet@...gle.com, pabeni@...hat.com, horms@...nel.org,
corbet@....net, mohsin.bashr@...il.com, sanmanpradhan@...a.com,
andrew+netdev@...n.ch, jdamato@...tly.com, sdf@...ichev.me,
linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-pci@...r.kernel.org
Subject: Re: [PATCH net-next] eth: fbnic: Add PCIe hardware statistics
On Thu, Nov 07, 2024 at 07:40:09AM -0800, Jakub Kicinski wrote:
> On Thu, 7 Nov 2024 14:03:57 +0200 Leon Romanovsky wrote:
> > > [root@...t ~]# ethtool -i eth0 | grep driver
> > > driver: mlx5_core
> > > [root@...t ~]# ethtool -S eth0 | grep pci
> > > rx_pci_signal_integrity: 1
> > > tx_pci_signal_integrity: 1471
> > > outbound_pci_stalled_rd: 0
> > > outbound_pci_stalled_wr: 0
> > > outbound_pci_stalled_rd_events: 0
> > > outbound_pci_stalled_wr_events: 0
> > >
> > > Isn't it a PCIe statistics?
> >
> > I didn't do full archaeological research and stopped at 2017 there these
> > counters were updated to use new API, but it looks like they there from
> > stone age.
> >
> > It was a mistake to put it there and they should be moved to PCI core
> > together with other hundreds debug counters which ConnectX devices have
> > but don't expose yet.
>
> Whatever hand-waving you do now, it's impossible to take you seriously
> where the device driver of which you are a maintainer does the same
> thing.
I said that it is a mistake and can add that we can move it to new infrastructure.
> And your direction going forward for PCIe debug, AFAIU, is the
> proprietary fwctl stuff. Please stop.
Nice, and we are returning back to the discussion of evil vendors vs.
good people who are working in cloud companies which produce hardware
for themselves but don't call themselves vendors.
The latter can do whatever they want, but vendors are doing only crap.
The patch author added these debug counters, and magically it is fine for you:
+ These counters indicate PCIe resource exhaustion events:
+ - pcie_ob_rd_no_tag: Read requests dropped due to tag unavailability
+ - pcie_ob_rd_no_cpl_cred: Read requests dropped due to completion credit exhaustion
+ - pcie_ob_rd_no_np_cred: Read requests dropped due to non-posted credit exhaustion
For example, mlx5 devices and Broadcom have two simple PCIe counters: rx_errors and tx_errors,
which have nothing to do with fwctl.
And the idea, what you can take mistakes from the past, ignore the
feedback and repeat these mistakes, fills me with amazement.
So why don't you allow module parameters? Many drivers have them, but
new are not allowed. If I claim that "vendor XXX has it, can I add it
too?", we all know the answer.
Thanks
Powered by blists - more mailing lists