[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250818190316.0bfdc719@kernel.org>
Date: Mon, 18 Aug 2025 19:03:16 -0700
From: Jakub Kicinski <kuba@...nel.org>
To: Oleksij Rempel <o.rempel@...gutronix.de>
Cc: Andrew Lunn <andrew@...n.ch>, Heiner Kallweit <hkallweit1@...il.com>,
"David S. Miller" <davem@...emloft.net>, Eric Dumazet
<edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>, Rob Herring
<robh@...nel.org>, Krzysztof Kozlowski <krzk+dt@...nel.org>, Florian
Fainelli <f.fainelli@...il.com>, Maxime Chevallier
<maxime.chevallier@...tlin.com>, Kory Maincent <kory.maincent@...tlin.com>,
Lukasz Majewski <lukma@...x.de>, Jonathan Corbet <corbet@....net>,
kernel@...gutronix.de, linux-kernel@...r.kernel.org,
netdev@...r.kernel.org, Russell King <linux@...linux.org.uk>,
Divya.Koppera@...rochip.com, linux-doc@...r.kernel.org
Subject: Re: [PATCH net-next v2 1/1] Documentation: networking: add detailed
guide on Ethernet flow control configuration
On Thu, 14 Aug 2025 09:53:42 +0200 Oleksij Rempel wrote:
> Introduce a new document, flow_control.rst, providing a comprehensive
> overview of Ethernet Flow Control in Linux. It explains how flow control
> works in full- and half-duplex modes, how autonegotiation resolves pause
> capabilities, and how users can inspect and configure flow control using
> ethtool and Netlink interfaces.
>
> The document also covers typical MAC implementations, PHY behavior,
> ethtool driver operations, and provides a test plan for verifying driver
> behavior across various scenarios.
>
> The legacy flow control section in phy.rst is replaced with a reference
> to this new document.
>
> Signed-off-by: Oleksij Rempel <o.rempel@...gutronix.de>
This conflicts again, FWIW, another rebase will be needed.
> diff --git a/Documentation/networking/flow_control.rst b/Documentation/networking/flow_control.rst
> new file mode 100644
> index 000000000000..5585434178e7
> --- /dev/null
> +++ b/Documentation/networking/flow_control.rst
> @@ -0,0 +1,383 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +=====================
> +Ethernet Flow Control
> +=====================
> +
> +This document is a practical guide to Ethernet Flow Control in Linux, covering
> +what it is, how it works, and how to configure it.
> +
> +What is Flow Control?
> +=====================
> +
> +Flow control is a mechanism to prevent a fast sender from overwhelming a
> +slow receiver with data, which would cause buffer overruns and dropped packets.
> +The receiver can signal the sender to temporarily stop transmitting, giving it
> +time to process its backlog.
You haven't covered PFC. Is PFC not used in TSN?
> +How It Works: The Two Mechanisms
> +================================
> +
> +The method used for flow control depends on the link's duplex mode.
> +
> +1. Full-Duplex: PAUSE Frames (IEEE 802.3 Annex 31B)
> +---------------------------------------------------
> +On full-duplex links, devices can send and receive at the same time. Flow
> +control is achieved by sending a special **PAUSE frame**.
> +
> +* **What it is**: A standard Ethernet frame with a globally reserved
> + destination MAC address (``01-80-C2-00-00-01``). This address is in a range
> + that standard IEEE 802.1D-compliant bridges do not forward. However, some
> + unmanaged or misconfigured bridges have been reported to forward these
> + frames, which can disrupt flow control across a network.
> +
> +* **How it works**: The frame contains a `pause_time` value, telling the
What's the logic behind using single backticks?
I'm a bit unclear on the expectations, AFAIU the single
backtick are supposed to be mostly references?
Unless you intend that it's safer to use double ticks everywhere
(ccL linux-doc to keep me honest).
> +Many MACs also implement automatic PAUSE frame transmission based on the fill
> +level of their internal RX FIFO. This is typically configured with two
> +thresholds:
> +
> +* **FLOW_ON (High Water Mark)**: When the RX FIFO usage reaches this
> + threshold, the MAC automatically transmits a PAUSE frame to stop the sender.
> +
> +* **FLOW_OFF (Low Water Mark)**: When the RX FIFO usage drops below this
> + threshold, the MAC transmits a PAUSE frame with a quanta of zero to tell
> + the sender it can resume transmission.
> +
> +The optimal values for these thresholds often depend on the bandwidth of the
> +bus between the MAC and the system's CPU or RAM. Like the pause quanta, there
> +is currently no generic kernel interface for tuning these thresholds.
I'm not sure if this is true. In the "fast devices" I'm familiar
with, at least, the pause threshold is only covering latency of
stopping the internal device pipeline, and the *wire side*.
Basically you need to be able to cover RTT/2 * link speed
with internal MAP IP buffering. I thought there were even
some formulas in the spec on how much latency the far end
is allowed before it processes the ctrl frame.
Long story short the thresholds generally have little to do with
"CPU or RAM" and much more with cable length. It should be worth
calling out that the driver is responsible for configuring sensible
defaults per IEEE spec. The only reason user should have to tweak
these thresholds, really, is on long fiber connections. Or I guess
if the user knows that the peer is buggy.
> +User Space Interface
> +--------------------
> +The primary user space tool for flow control configuration is `ethtool`. It
> +communicates with the kernel via netlink messages, specifically
> +`ETHTOOL_MSG_PAUSE_GET` and `ETHTOOL_MSG_PAUSE_SET`.
Linking to the ethtool_netlink section would be great, instead of
repeating her.e
> +These messages use a simple set of attributes that map to the members of the
> +`struct ethtool_pauseparam`:
> +
> +* `ETHTOOL_A_PAUSE_AUTONEG` -> `autoneg`
> +* `ETHTOOL_A_PAUSE_RX` -> `rx_pause`
> +* `ETHTOOL_A_PAUSE_TX` -> `tx_pause`
> +
> +The driver's implementation of the `.get_pauseparam` and `.set_pauseparam`
> +ethtool operations must correctly interpret these fields.
> +
> +* **On `get_pauseparam`**, the driver must report the user's configured flow
> + control policy.
> +
> + * The `autoneg` flag indicates the driver's behavior: if `on`, the driver
> + will respect the negotiated outcome; if `off`, the driver will use a
> + forced configuration.
> +
> + * The `rx_pause` and `tx_pause` flags reflect the currently preferred
> + configuration state, which depends on multiple factors.
> +
> +* **On `set_pauseparam`**, the driver must interpret the user's request:
> +
> + * The `autoneg` flag acts as a mode selector. If `on`, the driver
> + configures the PHY's advertisement based on `rx_pause` and `tx_pause`.
> +
> + * If `off`, the driver forces the MAC into the state defined by
> + `rx_pause` and `tx_pause`.
This belongs in the code. Please render this into the kdoc of correct
structs and use the power of kdoc to refer to those structs here.
Worst case you can use a DOC: section, if kdoc is too hard, but please
try to move description of the internal kernel APIs into code comments
which are included/referred to in the Documentation output.
And some kind of "See Documentation/networking/flow_control.rst" in
relevant places in the kernel code would be nice, too
> +Test Plan
> +=========
Obvious question.. could you make this into a python test?
put it under ..selftests/drivers/net/hw/, the SW emulation
of the features is not required.
> +This section outlines test cases for verifying flow control configuration. The
> +`ethtool -s` command is used to set the base link state (autoneg on/off), and
> +`ethtool -A` is used to configure the pause parameters within that state.
> +
> +Case 1: Base Link is Autonegotiating
> +------------------------------------
> +*Prerequisite*: `ethtool -s eth0 autoneg on`
--
pw-bot: cr
Powered by blists - more mailing lists