lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250926171921.7106b19b@kernel.org>
Date: Fri, 26 Sep 2025 17:19:21 -0700
From: Jakub Kicinski <kuba@...nel.org>
To: Oleksij Rempel <o.rempel@...gutronix.de>
Cc: Andrew Lunn <andrew@...n.ch>, Heiner Kallweit <hkallweit1@...il.com>,
 "David S. Miller" <davem@...emloft.net>, Eric Dumazet
 <edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>, Rob Herring
 <robh@...nel.org>, Krzysztof Kozlowski <krzk+dt@...nel.org>, Florian
 Fainelli <f.fainelli@...il.com>, Maxime Chevallier
 <maxime.chevallier@...tlin.com>, Kory Maincent <kory.maincent@...tlin.com>,
 Lukasz Majewski <lukma@...x.de>, Jonathan Corbet <corbet@....net>, Donald
 Hunter <donald.hunter@...il.com>, Vadim Fedorenko
 <vadim.fedorenko@...ux.dev>, Jiri Pirko <jiri@...nulli.us>, Vladimir Oltean
 <vladimir.oltean@....com>, Alexei Starovoitov <ast@...nel.org>, Daniel
 Borkmann <daniel@...earbox.net>, Jesper Dangaard Brouer <hawk@...nel.org>,
 John Fastabend <john.fastabend@...il.com>, kernel@...gutronix.de,
 linux-kernel@...r.kernel.org, netdev@...r.kernel.org, Russell King
 <linux@...linux.org.uk>, Divya.Koppera@...rochip.com, Sabrina Dubroca
 <sd@...asysnail.net>, Stanislav Fomichev <sdf@...ichev.me>
Subject: Re: [PATCH net-next v7 1/1] Documentation: net: add flow control
 guide and document ethtool API

On Wed, 24 Sep 2025 14:02:41 +0200 Oleksij Rempel wrote:
>      name: pause-stat
> +    doc: Statistics counters for link-wide PAUSE frames (IEEE 802.3 Annex 31B).
>      attr-cnt-name: __ethtool-a-pause-stat-cnt
> +    enum-name: ethtool-a-pause-stat

Naming attribute enums is relatively rare and kinda unnecessary TBH,
because the values are almost never held as state or passed around.
99.9% of the time we use the literals.

enums for actual enum attributes (the value is the enum) - sure,
enums for attr types - 🤷️

>          name: stats
> +        doc: |
> +          Contains the pause statistics counters. The source of these
> +          statistics is determined by stats-src.

I'd skip mentioning the source here TBH. Or we need to describe what
the MM is, shortly? I don't have recent embedded experience but I
thought MM is relatively rare. So mentioning it for a very common 
attribute could confuse.

>          type: nest
>          nested-attributes: pause-stat
>        -
>          name: stats-src
> +        doc: |
> +          Selects the source of the MAC statistics, values from
> +          enum ethtool_mac_stats_src. This allows requesting statistics
> +          from the individual components of the MAC Merge layer.
>          type: u32
>    -
>      name: eee
> diff --git a/Documentation/networking/flow_control.rst b/Documentation/networking/flow_control.rst
> new file mode 100644
> index 000000000000..48646d54513f
> --- /dev/null
> +++ b/Documentation/networking/flow_control.rst
> @@ -0,0 +1,373 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +.. _ethernet-flow-control:
> +
> +=====================
> +Ethernet Flow Control
> +=====================
> +
> +This document is a practical guide to Ethernet Flow Control in Linux, covering
> +what it is, how it works, and how to configure it.
> +
> +What is Flow Control?
> +=====================
> +
> +Flow control is a mechanism to prevent a fast sender from overwhelming a
> +slow receiver with data, which would cause buffer overruns and dropped packets.
> +The receiver can signal the sender to temporarily stop transmitting, giving it
> +time to process its backlog.
> +
> +Standards references
> +====================
> +
> +Ethernet flow control mechanisms are specified across consolidated IEEE base

nit:        Flow Control ?  we should be consistent

> +standards; some originated as amendments:
> +
> +- Collision-based flow control is part of CSMA/CD in **IEEE 802.3**
> +  (half-duplex).
> +- Link-wide PAUSE is defined in **IEEE 802.3 Annex 31B**
> +  (originally **802.3x**).
> +- Priority-based Flow Control (PFC) is defined in **IEEE 802.1Q Clause 36**
> +  (originally **802.1Qbb**).
> +
> +In the remainder of this document, the consolidated clause numbers are used.
> +
> +How It Works: The Mechanisms
> +============================
> +
> +The method used for flow control depends on the link's duplex mode.
> +
> +.. note::
> +   The user-visible ``ethtool`` pause API described in this document controls
> +   **link-wide PAUSE** (IEEE 802.3 Annex 31B) only. It does not control the
> +   collision-based behavior that exists on half-duplex links.

 ... or PFC ?

> +1. Half-Duplex: Collision-Based Flow Control
> +--------------------------------------------
> +On half-duplex links, a device cannot send and receive simultaneously, so PAUSE
> +frames are not used. Flow control is achieved by leveraging the CSMA/CD
> +(Carrier Sense Multiple Access with Collision Detection) protocol itself.
> +
> +* **How it works**: To inhibit incoming data, a receiving device can force a
> +  collision on the line. When the sending station detects this collision, it
> +  terminates its transmission, sends a "jam" signal, and then executes the
> +  "Collision backoff and retransmission" procedure as defined in IEEE 802.3,
> +  Section 4.2.3.2.5. This algorithm makes the sender wait for a random
> +  period before attempting to retransmit. By repeatedly forcing collisions,
> +  the receiver can effectively throttle the sender's transmission rate.
> +
> +.. note::
> +    While this mechanism is part of the IEEE standard, there is currently no
> +    generic kernel API to configure or control it. Drivers should not enable
> +    this feature until a standardized interface is available.
> +
> +.. warning::
> +   On shared-medium networks (e.g. 10BASE2, or twisted-pair networks using a
> +   hub rather than a switch) forcing collisions inhibits traffic **across the
> +   entire shared segment**, not just a single point-to-point link. Enabling
> +   such behavior is generally undesirable.
> +
> +2. Full-Duplex: Link-wide PAUSE (IEEE 802.3 Annex 31B)
> +------------------------------------------------------
> +On full-duplex links, devices can send and receive at the same time. Flow
> +control is achieved by sending a special **PAUSE frame**, defined by IEEE
> +802.3 Annex 31B. This mechanism pauses all traffic on the link and is therefore
> +called *link-wide PAUSE*.
> +
> +* **What it is**: A standard Ethernet frame with a globally reserved
> +  destination MAC address (``01-80-C2-00-00-01``). This address is in a range
> +  that standard IEEE 802.1D-compliant bridges do not forward. However, some
> +  unmanaged or misconfigured bridges have been reported to forward these
> +  frames, which can disrupt flow control across a network.
> +
> +* **How it works**: The frame contains a MAC Control opcode for PAUSE
> +  (``0x0001``) and a ``pause_time`` value, telling the sender how long to
> +  wait before sending more data frames. This time is specified in units of
> +  "pause quantum", where one quantum is the time it takes to transmit 512 bits.
> +  For example, one pause quantum is 51.2 microseconds on a 10 Mbit/s link,
> +  and 512 nanoseconds on a 1 Gbit/s link. A ``pause_time`` of zero indicates
> +  that the transmitter can resume transmission, even if a previous non-zero
> +  pause time has not yet elapsed.
> +
> +* **Who uses it**: Any full-duplex link, from 10 Mbit/s to multi-gigabit speeds.
> +
> +3. Full-Duplex: Priority-based Flow Control (PFC) (IEEE 802.1Q Clause 36)
> +-------------------------------------------------------------------------
> +Priority-based Flow Control is an enhancement to the standard PAUSE mechanism
> +that allows flow control to be applied independently to different classes of
> +traffic, identified by their priority level.

should we add .. specified in the 802.1Q VLAN tag ?

> +
> +* **What it is**: PFC allows a receiver to pause traffic for one or more of the
> +  8 standard priority levels without stopping traffic for other priorities.
> +  This is critical in data center environments for protocols that cannot
> +  tolerate packet loss due to congestion (e.g., Fibre Channel over Ethernet
> +  or RoCE).

nit: either

 FCoE and RoCE 
   or
 Fibre Channel .. and RDMA over Converged ..

?

> +* **How it works**: PFC uses a specific PAUSE frame format. It shares the same
> +  globally reserved destination MAC address (``01-80-C2-00-00-01``) as legacy
> +  PAUSE frames but uses a unique opcode (``0x0101``). The frame payload
> +  contains two key fields:


> +Kernel Policy: "Set and Trust"
> +==============================
> +
> +The ethtool pause API is defined as a **wish policy** for
> +IEEE 802.3 link-wide PAUSE only. A user request is always accepted
> +as the preferred configuration, but it may not be possible to apply
> +it in all link states.
> +
> +Key constraints:
> +
> +- Link-wide PAUSE is not valid on half-duplex links.
> +- Link-wide PAUSE cannot be used together with Priority-based Flow Control
> +  (PFC, IEEE 802.1Q Clause 36).
> +- If autonegotiation is active and the link is currently down, the future
> +  mode is not yet known.
> +
> +Because of these constraints, the kernel stores the requested setting
> +and applies it only when the link is in a compatible state.
> +
> +Implications for userspace:
> +
> +1. Set once (the "wish"): the requested Rx/Tx PAUSE policy is
> +   remembered even if it cannot be applied immediately.
> +2. Applied conditionally: when the link comes up, the kernel enables
> +   PAUSE only if the active mode allows it.

IDK about this section and also ...

>  Keeping Close Tabs on the PAL
>  =============================
> diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
> index c869b7f8bce8..1f121108f236 100644
> --- a/include/linux/ethtool.h
> +++ b/include/linux/ethtool.h
> @@ -931,9 +931,48 @@ struct kernel_ethtool_ts_info {
>   * @get_pause_stats: Report pause frame statistics. Drivers must not zero
>   *	statistics which they don't report. The stats structure is initialized
>   *	to ETHTOOL_STAT_NOT_SET indicating driver does not report statistics.
> - * @get_pauseparam: Report pause parameters
> - * @set_pauseparam: Set pause parameters.  Returns a negative error code
> - *	or zero.
> + *
> + * @get_pauseparam: Report the configured policy for link-wide PAUSE
> + *      (IEEE 802.3 Annex 31B). Drivers must fill struct ethtool_pauseparam
> + *      such that:
> + *      @autoneg:
> + *              This refers to **Pause Autoneg** (IEEE 802.3 Annex 31B) only
> + *              and is independent of generic link autonegotiation configured
> + *              via ethtool -s.
> + *              true  -> the device follows the negotiated result of pause
> + *                       autonegotiation (Pause/Asym);
> + *              false -> the device uses a forced MAC state independent of
> + *                       negotiation.
> + *      @rx_pause/@...pause:
> + *              represent the desired policy (preferred configuration).
> + *              In autoneg mode they describe what is to be advertised;

... this. IDK what you guys do in the Linux-managed code but the
convention for integrated devices is spelled out here:

/**
 * struct ethtool_pauseparam - Ethernet pause (flow control) parameters
 * @cmd: Command number = %ETHTOOL_GPAUSEPARAM or %ETHTOOL_SPAUSEPARAM
 * @autoneg: Flag to enable autonegotiation of pause frame use
 * @rx_pause: Flag to enable reception of pause frames
 * @tx_pause: Flag to enable transmission of pause frames
 *
 * Drivers should reject a non-zero setting of @autoneg when             <<< [1]
 * autoneogotiation is disabled (or not supported) for the link.         <<<
 *
 * If the link is autonegotiated, drivers should use
 * mii_advertise_flowctrl() or similar code to set the advertised
 * pause frame capabilities based on the @rx_pause and @tx_pause flags,
 * even if @autoneg is zero.  They should also allow the advertised
 * pause frame capabilities to be controlled directly through the
 * advertising field of &struct ethtool_cmd.
 *
 * If @autoneg is non-zero, the MAC is configured to send and/or
 * receive pause frames according to the result of autonegotiation.
 * Otherwise, it is configured directly based on the @rx_pause and
 * @tx_pause flags.
 */

Doesn't [1] contradict your description of kernel "storing the config"?
Also you're not reflecting this in the help for the set op..

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ