[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aXPntx9wiUqbKGRN@pengutronix.de>
Date: Fri, 23 Jan 2026 22:27:19 +0100
From: Oleksij Rempel <o.rempel@...gutronix.de>
To: Mohsin Bashir <mohsin.bashr@...il.com>
Cc: netdev@...r.kernel.org, alexanderduyck@...com, alok.a.tiwari@...cle.com,
andrew+netdev@...n.ch, andrew@...n.ch, chuck.lever@...cle.com,
davem@...emloft.net, donald.hunter@...il.com, edumazet@...gle.com,
gal@...dia.com, horms@...nel.org, idosch@...dia.com,
jacob.e.keller@...el.com, kernel-team@...a.com,
kory.maincent@...tlin.com, kuba@...nel.org, lee@...ger.us,
pabeni@...hat.com, vadim.fedorenko@...ux.dev
Subject: Re: [PATCH net-next 1/3] net: ethtool: Track pause storm events
On Thu, Jan 22, 2026 at 11:21:56AM -0800, Mohsin Bashir wrote:
> With TX pause enabled, if a device is unable to pass packets up to the
> stack (e.g., CPU is hanged), the device can cause pause storm. Given
> that devices can have native support to protect the neighbor from such
> flooding, such events need some tracking. This support is to track TX
> pause storm events for better observability.
>
> Signed-off-by: Jakub Kicinski <kuba@...nel.org>
> Signed-off-by: Mohsin Bashir <mohsin.bashr@...il.com>
> ---
> Documentation/netlink/specs/ethtool.yaml | 13 +++++++++++++
> include/linux/ethtool.h | 2 ++
> include/uapi/linux/ethtool_netlink_generated.h | 1 +
> net/ethtool/pause.c | 4 +++-
> 4 files changed, 19 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/netlink/specs/ethtool.yaml b/Documentation/netlink/specs/ethtool.yaml
> index 0a2d2343f79a..4707063af3b4 100644
> --- a/Documentation/netlink/specs/ethtool.yaml
> +++ b/Documentation/netlink/specs/ethtool.yaml
> @@ -879,6 +879,19 @@ attribute-sets:
> -
> name: rx-frames
> type: u64
> + -
> + name: tx-pause-storm-events
> + type: u64
> + doc: >-
> + TX pause storm event count. Increments each time device
> + detects that its pause assertion condition has been true
> + for too long for normal operation. As a result, the device
> + has temporarily disabled its own Pause TX function to
> + protect the network from itself.
> + This counter should never increment under normal overload
> + conditions; it indicates catastrophic failure like an OS
> + crash. The rate of incrementing is implementation specific.
Hm, we already have the tx pause frame counters. So, the anomaly is
visible to the user anyway (even if it isn't explicitly labeled as an
anomaly).
What is not visible to the user is when HW or SW disables flow control.
Maybe that is what the counter should represent and be named? Would
tx-pause-auto-disabled-events make sense?
The reason I do not like tx-pause-storm-events is that the meaning is
device specific; the user has to read the device manual to know what it
actually means.
tx-pause-auto-disabled-events can be reused in more cases - every time
we try to pause flow control for some reason.
--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
Powered by blists - more mailing lists