lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALOAHbBJ2xWKZ5frzR5wKq1D7-mzS62QkWpxB5Q-A7dR-Djhnw@mail.gmail.com>
Date: Fri, 15 Nov 2024 13:50:45 +0800
From: Yafang Shao <laoar.shao@...il.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: ttoukan.linux@...il.com, gal@...dia.com, saeedm@...dia.com, 
	tariqt@...dia.com, leon@...nel.org, netdev@...r.kernel.org, 
	linux-rdma@...r.kernel.org
Subject: Re: [PATCH v2 net-next] net/mlx5e: Report rx_discards_phy via rx_fifo_errors

On Fri, Nov 15, 2024 at 12:32 PM Jakub Kicinski <kuba@...nel.org> wrote:
>
> On Fri, 15 Nov 2024 11:56:38 +0800 Yafang Shao wrote:
> > > On Thu, 14 Nov 2024 10:17:11 +0800 Yafang Shao wrote:
> > > > - *   Not recommended for use in drivers for high speed interfaces.
> > >
> > > I thought I suggested we provide clear guidance on this counter being
> > > related to processing pipeline being to slow, vs host backpressure.
> > > Just deleting the line that says "don't use" is not going to cut it :|
> >
> > Hello Jakub,
> >
> > After investigating other network drivers, I found that they all
> > report this metric to rx_missed_errors:
> >
> > - i40e
> >   The corresponding ethtool metric is port.rx_discards, which was
> > mapped to rx_missed_errors in commit 5337d2949733 ("i40e: Add
> > rx_missed_errors for buffer exhaustion").
> >
> > - broadcom
> >   The equivalent metric is rx_total_discard_pkts, reported as
> > rx_missed_errors in commit c0c050c58d84 ("bnxt_en: New Broadcom
> > ethernet driver")
> >
> > Given this, it seems we should align with the standard practice and
> > report this metric to rx_missed_errors.
> >
> > Tariq, what are your thoughts?
>
> mlx5 already reports rx_missed_errors and AFAIU rx_discards_phy are very
> different kind of drops than the drops reported as 'missed'.
> The distinction is useful in production in my experience working with
> mlx5 devices.

>From the manual [0], it says :

The number of received packets dropped due to lack of buffers on a
physical port. If this counter is increasing, it implies that the
adapter is congested and cannot absorb the traffic coming from the
network.

Would it be possible to add this description to if_link.h?

Frankly, it doesn’t make much difference to end users like me whether
this is reported to rx_missed_errors or rx_fifo_errors; the main goal
is simply to monitor this metric to flag any issues...

[0]. https://enterprise-support.nvidia.com/s/article/understanding-mlx5-ethtool-counters


--
Regards
Yafang

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ