[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250822113519.y6maeu4ifoqx4mxe@skbuf>
Date: Fri, 22 Aug 2025 14:35:19 +0300
From: Vladimir Oltean <vladimir.oltean@....com>
To: Oleksij Rempel <o.rempel@...gutronix.de>
Cc: Andrew Lunn <andrew@...n.ch>, Heiner Kallweit <hkallweit1@...il.com>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Rob Herring <robh@...nel.org>,
Krzysztof Kozlowski <krzk+dt@...nel.org>,
Florian Fainelli <f.fainelli@...il.com>,
Maxime Chevallier <maxime.chevallier@...tlin.com>,
Kory Maincent <kory.maincent@...tlin.com>,
Lukasz Majewski <lukma@...x.de>, Jonathan Corbet <corbet@....net>,
Donald Hunter <donald.hunter@...il.com>,
Vadim Fedorenko <vadim.fedorenko@...ux.dev>,
Jiri Pirko <jiri@...nulli.us>, Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Jesper Dangaard Brouer <hawk@...nel.org>,
John Fastabend <john.fastabend@...il.com>, kernel@...gutronix.de,
linux-kernel@...r.kernel.org, netdev@...r.kernel.org,
Russell King <linux@...linux.org.uk>, Divya.Koppera@...rochip.com,
Sabrina Dubroca <sd@...asysnail.net>,
Stanislav Fomichev <sdf@...ichev.me>
Subject: Re: [PATCH net-next v3 3/3] Documentation: net: add flow control
guide and document ethtool API
On Wed, Aug 20, 2025 at 03:10:23PM +0200, Oleksij Rempel wrote:
> name: stats-src
> + doc: |
> + Selects the source of the MAC statistics, values from
> + enum ethtool_mac_stats_src. This allows requesting statistics
> + from an aggregated MAC or a specific PHY, for example.
"This allows requesting statistics from the individual components of the
MAC Merge layer" would be better - nothing to do with PHYs.
> type: u32
> -
> name: eee
> diff --git a/Documentation/networking/flow_control.rst b/Documentation/networking/flow_control.rst
> new file mode 100644
> index 000000000000..ba315a5bcb87
> --- /dev/null
> +++ b/Documentation/networking/flow_control.rst
> @@ -0,0 +1,379 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +.. _ethernet-flow-control:
> +
> +=====================
> +Ethernet Flow Control
> +=====================
> +
> +This document is a practical guide to Ethernet Flow Control in Linux, covering
> +what it is, how it works, and how to configure it.
> +
> +What is Flow Control?
> +=====================
> +
> +Flow control is a mechanism to prevent a fast sender from overwhelming a
> +slow receiver with data, which would cause buffer overruns and dropped packets.
> +The receiver can signal the sender to temporarily stop transmitting, giving it
> +time to process its backlog.
> +
> +Standards references
> +====================
> +
> +Ethernet flow control mechanisms are specified across consolidated IEEE base
> +standards; some originated as amendments:
> +
> +- Collision-based flow control is part of CSMA/CD in **IEEE 802.3**
> + (half-duplex).
> +- Link‑wide PAUSE is defined in **IEEE 802.3 Annex 31B**
There are some odd characters here.
> + (originally **802.3x**).
> +- Priority-based Flow Control (PFC) is defined in **IEEE 802.1Q Clause 36**
> + (originally **802.1Qbb**).
> +
> +In the remainder of this document, the consolidated clause numbers are used.
> +
> +How It Works: The Mechanisms
> +============================
> +
> +The method used for flow control depends on the link's duplex mode.
> +
> +.. note::
> + The user-visible ``ethtool`` pause API described in this document controls
> + **link-wide PAUSE** (IEEE 802.3 Annex 31B) only. It does not control the
> + collision-based behavior that exists on half-duplex links.
> +
> +2. Full-Duplex: Link-wide PAUSE (IEEE 802.3 Annex 31B)
> +------------------------------------------------------
> +On full-duplex links, devices can send and receive at the same time. Flow
> +control is achieved by sending a special **PAUSE frame**, defined by IEEE
> +802.3 Annex 31B. This mechanism pauses all traffic on the link and is therefore
> +called *link-wide PAUSE*.
> +
> +* **What it is**: A standard Ethernet frame with a globally reserved
> + destination MAC address (``01-80-C2-00-00-01``). This address is in a range
> + that standard IEEE 802.1D-compliant bridges do not forward. However, some
> + unmanaged or misconfigured bridges have been reported to forward these
> + frames, which can disrupt flow control across a network.
> +
> +* **How it works**: The frame contains a MAC Control opcode for PAUSE
> + (``0x0001``) and a ``pause_time`` value, telling the sender how long to
> + wait before sending more data frames. This time is specified in units of
> + "pause quanta," where one quantum is the time it takes to transmit 512 bits.
> + For example, one pause quantum is 51.2 microseconds on a 10 Mbit/s link,
> + and 512 nanoseconds on a 1 Gbit/s link.
I might also mention that the quantum value of 0 is special and it means
that the transmitter can resume, even if past quanta have not elapsed.
> +
> +* **Who uses it**: Any full-duplex link, from 10 Mbit/s to multi-gigabit speeds.
> +
> +The MAC (Media Access Controller)
> +---------------------------------
> +The MAC is the hardware component that actually sends and receives PAUSE
> +frames. Its capabilities define the upper limit of what the driver can support.
> +For link-wide PAUSE, MACs can vary in their support for symmetric (both
> +directions) or asymmetric (independent TX/RX) flow control.
> +
> +For PFC, the MAC must be capable of generating and interpreting the
> +priority-based PAUSE frames and managing separate pause states for each
> +traffic class.
> +
> +Many MACs also implement automatic PAUSE frame transmission based on the fill
> +level of their internal RX FIFO. This is typically configured with two
> +thresholds:
> +
> +* **FLOW_ON (High Water Mark)**: When the RX FIFO usage reaches this
> + threshold, the MAC automatically transmits a PAUSE frame to stop the sender.
> +
> +* **FLOW_OFF (Low Water Mark)**: When the RX FIFO usage drops below this
> + threshold, the MAC transmits a PAUSE frame with a quanta of zero to tell
I think quanta is plural.
> + the sender it can resume transmission.
> +
> +The optimal values for these thresholds depend on the link's round-trip-time
> +(RTT) and the peer's internal processing latency. The high water mark must be
> +set low enough so that the MAC's RX FIFO does not overflow while waiting for
> +the peer to react to the PAUSE frame. The driver is responsible for configuring
> +sensible defaults according to the IEEE specification. User tuning should only
> +be necessary in special cases, such as on links with unusually long cable
> +lengths (e.g., long-haul fiber).
How would user tuning be achieved?
> diff --git a/include/uapi/linux/ethtool_netlink_generated.h b/include/uapi/linux/ethtool_netlink_generated.h
> index 46de09954042..0af7b90101c1 100644
> --- a/include/uapi/linux/ethtool_netlink_generated.h
> +++ b/include/uapi/linux/ethtool_netlink_generated.h
> @@ -394,7 +400,25 @@ enum {
> ETHTOOL_A_PAUSE_STAT_MAX = (__ETHTOOL_A_PAUSE_STAT_CNT - 1)
> };
>
> -enum {
> +/**
> + * enum ethtool_pause - Parameters for link-wide PAUSE (IEEE 802.3 Annex 31B).
> + * @ETHTOOL_A_PAUSE_AUTONEG: Acts as a mode selector for the driver. On GET:
> + * indicates the driver's behavior. If true, the driver will respect the
> + * negotiated outcome; if false, the driver will use a forced configuration.
> + * On SET: if true, the driver configures the PHY's advertisement based on
> + * the rx and tx attributes. If false, the driver forces the MAC into the
> + * state defined by the rx and tx attributes.
> + * @ETHTOOL_A_PAUSE_RX: Enable receiving PAUSE frames (pausing local TX). On
> + * GET: reflects the currently preferred configuration state.
> + * @ETHTOOL_A_PAUSE_TX: Enable transmitting PAUSE frames (pausing peer TX). On
> + * GET: reflects the currently preferred configuration state.
> + * @ETHTOOL_A_PAUSE_STATS: Contains the pause statistics counters. The source
> + * of these statistics is determined by stats-src.
> + * @ETHTOOL_A_PAUSE_STATS_SRC: Selects the source of the MAC statistics, values
> + * from enum ethtool_mac_stats_src. This allows requesting statistics from an
> + * aggregated MAC or a specific PHY, for example.
Same here.
> + */
> +enum ethtool_a_pause {
> ETHTOOL_A_PAUSE_UNSPEC,
> ETHTOOL_A_PAUSE_HEADER,
> ETHTOOL_A_PAUSE_AUTONEG,
Powered by blists - more mailing lists