[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <9b3af2dd-8b56-4817-b223-c6a85ba80562@nvidia.com>
Date: Wed, 6 Nov 2024 21:23:47 +0200
From: Gal Pressman <gal@...dia.com>
To: Yafang Shao <laoar.shao@...il.com>, Tariq Toukan <ttoukan.linux@...il.com>
Cc: saeedm@...dia.com, tariqt@...dia.com, leon@...nel.org,
netdev@...r.kernel.org, linux-rdma@...r.kernel.org
Subject: Re: [PATCH] net/mlx5e: Report rx_discards_phy via rx_missed_errors
On 06/11/2024 13:49, Yafang Shao wrote:
> On Wed, Nov 6, 2024 at 5:56 PM Tariq Toukan <ttoukan.linux@...il.com> wrote:
>>
>>
>>
>> On 06/11/2024 8:40, Yafang Shao wrote:
>>> We observed a high number of rx_discards_phy events on some servers when
>>> running `ethtool -S`. However, this important counter is not currently
>>> reflected in the /proc/net/dev statistics file, making it challenging to
>>> monitor effectively.
>>>
>>> Since rx_missed_errors represents packets dropped due to buffer exhaustion,
>>> it makes sense to include rx_discards_phy in this counter to enhance
>>> monitoring visibility. This change will help administrators track these
>>> events more effectively through standard interfaces.
>>>
>>
>> Hi,
>>
>> Thanks for your patch.
>>
>> It's a matter of interpretation...
>> The documentation in
>> Documentation/ABI/testing/sysfs-class-net-statistics refers to the
>> driver for the exact meaning.
I think this documentation is outdated, a more recent one is in if_link.h:
* @rx_missed_errors: Count of packets missed by the host.
* Folded into the "drop" counter in `/proc/net/dev`.
*
* Counts number of packets dropped by the device due to lack
* of buffer space. This usually indicates that the host interface
* is slower than the network interface, or host is not keeping up
* with the receive packet rate.
*
* This statistic corresponds to hardware events and is not used
* on software devices.
>>
>> rx_discards_phy counts packet drops due to exhaustion of the physical
>> port memory (not in the host), this happen way before steering the
>> packet to any receive queue.
>> Today, rx_missed_errors counts SW/host memory buffer exhaustion of the
>> receive queues.
>> I don't think that rx_missed_errors should mix both.
>
> Thanks for your detailed explanation.
>
>>
>> Maybe some other counter can be used for rx_discards_phy, like
>> rx_fifo_errors?
>
> It appears that rx_fifo_errors is a more appropriate counter for this purpose.
> I will submit a v2. Thanks for your suggestion.
Probably not a good idea:
* This statistics was used interchangeably with @rx_over_errors.
* Not recommended for use in drivers for high speed interfaces.
Powered by blists - more mailing lists