netdev - Re: [PATCH net-next 0/3] net: ethtool: Track TX pause storm

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aXNbTVF5KNz1yV-1@pengutronix.de>
Date: Fri, 23 Jan 2026 12:28:13 +0100
From: Oleksij Rempel <o.rempel@...gutronix.de>
To: Mohsin Bashir <mohsin.bashr@...il.com>
Cc: netdev@...r.kernel.org, alexanderduyck@...com, alok.a.tiwari@...cle.com,
	andrew+netdev@...n.ch, andrew@...n.ch, chuck.lever@...cle.com,
	davem@...emloft.net, donald.hunter@...il.com, edumazet@...gle.com,
	gal@...dia.com, horms@...nel.org, idosch@...dia.com,
	jacob.e.keller@...el.com, kernel-team@...a.com,
	kory.maincent@...tlin.com, kuba@...nel.org, lee@...ger.us,
	pabeni@...hat.com, vadim.fedorenko@...ux.dev, kernel@...gutronix.de
Subject: Re: [PATCH net-next 0/3] net: ethtool: Track TX pause storm

Here is a TL;DR summary of my questions regarding the pause storm logic
:)

- Does the 500ms hardware timer reset on "flapping" pause signals? If so,
  a stuttering storm might still crash the link partner (tx watchdog
  timeout).

- The auto-recovery (service task) enforces a fixed policy. Can we make
  this configurable? I used devlink health (.recover) to let userspace
  decide between auto-reset or manual intervention.

- Should we standardize an "RX Watchdog" mechanism in the core instead of
  or in addition to driver-specific stats?

- If main case where we will run in to tx pause storm is OS crash, what
  instance will be able to read this stats? Are they preserved on reboot
  or kexec?

On Fri, Jan 23, 2026 at 10:34:04AM +0100, Oleksij Rempel wrote:
> Hi Mohsin, hi Jakub!
> 
> On Thu, Jan 22, 2026 at 11:21:55AM -0800, Mohsin Bashir wrote:
> > With TX pause enabled, if a device cannot deliver received frames to
> > the stack (e.g., during a system hang), it may generate excessive pause
> > frames causing a pause storm. This series updates the uAPI to track TX
> > pause storm events as part of the pause stats (p1), adds pause storm
> > protection support for fbnic (p2), and leverages p1 to provide
> > observability into these events for fbnic (p3).
> > 
> > Mohsin Bashir (3):
> >   net: ethtool: Track pause storm events
> >   eth: fbnic: Add protection against pause storm
> >   eth: fbnic: Fetch TX pause storm stats
> > 
>  
> Very interesting series! Before the Christmas break, I started looking into a
> similar issue on different hardware (LAN78xx USB Ethernet), which has counters
> but no automatic pause storm mitigation in silicon. For reference, here is the
> RFC patch set I'm working on:
> https://lore.kernel.org/all/20260123090741.1566469-1-o.rempel@pengutronix.de/
> 
> In my implementation, I added the detection logic to the driver but exposed it
> via the devlink health interface.
> 
> I have a few questions regarding the definitions and the architectural approach
> here.
> 
> -  Definition of "Pause Storm"
> 
> It seems fbnic defines a storm as a continuous assertion of the pause signal
> (duration-based). The code defines FBNIC_RXB_PAUSE_STORM_UNIT_WR with a 10us
> granularity and a 500ms threshold.
> 
> Does this hardware detector handle "flapping" (high rate of distinct pause
> frames) or is it strictly for a stuck-high condition (deadlock/hang)? If the RX
> queue drains slightly and fills up again every 100ms, does the internal timer
> reset? If so, a "stuttering" storm might bypass this protection. Except the
> primary goal of this feature is to prevent the link partner's TX watchdog from
> firing (saving the neighbor from a reset).
> 
> - Policy and Tunability
> 
> The current implementation enforces an auto-recovery policy that is invisible
> to userspace. By resetting the storm protection in fbnic_service_task (runs
> every 1s), you effectively enforce a ~50% duty cycle on a hung system.
> 
> Since this logic decides when to stop flow control and start dropping packets,
> shouldn't these thresholds be configurable?
> 
> I understand that in a complete OS crash, software cannot intervene, and we
> rely entirely on the hardware's 500ms safety breaker. However, for partial
> stalls (livelocks, soft lockups) where the service task might still run, the
> hardcoded "force normal" policy might not be ideal.
> 
> Furthermore, having the ability to disable auto-recovery is critical for
> testing. If we want to develop/validate a driver for the link partner that
> detects incoming pause storms, we need a way to generate a sustained storm from
> this device without the safety breaker constantly resetting it. A "controllable
> reference" is very valuable here.
> 
> In my LAN78xx work, I used the devlink reporter's .recover callback. This lets
> userspace control the policy (devlink health set ... recover
> true/false) - determining whether the driver should auto-reset the MAC/PHY or
> wait for manual intervention. Baking the "force normal" logic into the service
> task removes this control.
> 
> Architecture:
> 
> This feels very similar to the existing ndo_tx_timeout watchdog, but for the RX
> side. We are essentially trying to handle an "RX Timeout" or "RX Stall" where
> the device cannot push packets to the host.
> 
> I assume the 500ms hardware threshold is specifically chosen to be well below
> the standard 5-second ndo_tx_timeout of link partners, preventing a hung node
> from causing watchdog resets on its neighbors. I've observed cases where
> triggering a full ndo_tx_timeout reset on a stalled device leaves the hardware
> in a partially reconfigured "zombie" state, unable to receive anything
> thereafter. 
> 
> Did you consider if we should approach this as a central networking mechanism
> (an "RX Watchdog")?  I am debating this for my own driver:
> 
> - Option A: Export needed metrics/events to userspace (Devlink/Ethtool) and let
>   a userspace agent decide when to reset.
> 
> - Option B: Handle it entirely in the kernel (like this patch), but potentially
>   standardize the "RX Stall" behavior so we don't end up with different
>   heuristics in every driver.
> 
> Best Regards,
> Oleksij
> -- 
> Pengutronix e.K.                           |                             |
> Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
> 31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
> Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |