[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aSgX9ue6uUheX4aB@pengutronix.de>
Date: Thu, 27 Nov 2025 10:20:54 +0100
From: Oleksij Rempel <o.rempel@...gutronix.de>
To: Jakub Kicinski <kuba@...nel.org>
Cc: Andrew Lunn <andrew@...n.ch>, Vladimir Oltean <vladimir.oltean@....com>,
Alexei Starovoitov <ast@...nel.org>,
Russell King <linux@...linux.org.uk>,
Eric Dumazet <edumazet@...gle.com>, Rob Herring <robh@...nel.org>,
Florian Fainelli <f.fainelli@...il.com>,
Donald Hunter <donald.hunter@...il.com>,
Daniel Borkmann <daniel@...earbox.net>,
Jonathan Corbet <corbet@....net>,
John Fastabend <john.fastabend@...il.com>,
Lukasz Majewski <lukma@...x.de>,
Maxime Chevallier <maxime.chevallier@...tlin.com>,
Stanislav Fomichev <sdf@...ichev.me>,
Paolo Abeni <pabeni@...hat.com>, Jiri Pirko <jiri@...nulli.us>,
Jesper Dangaard Brouer <hawk@...nel.org>,
Divya.Koppera@...rochip.com,
Kory Maincent <kory.maincent@...tlin.com>,
Vadim Fedorenko <vadim.fedorenko@...ux.dev>, netdev@...r.kernel.org,
Sabrina Dubroca <sd@...asysnail.net>, linux-kernel@...r.kernel.org,
kernel@...gutronix.de, Krzysztof Kozlowski <krzk+dt@...nel.org>,
"David S. Miller" <davem@...emloft.net>,
Heiner Kallweit <hkallweit1@...il.com>
Subject: Re: [PATCH net-next v8 1/1] Documentation: net: add flow control
guide and document ethtool API
On Wed, Nov 26, 2025 at 02:42:25PM -0800, Jakub Kicinski wrote:
> On Wed, 26 Nov 2025 09:36:42 +0100 Oleksij Rempel wrote:
> > On Tue, Nov 25, 2025 at 06:19:57PM -0800, Jakub Kicinski wrote:
> > > On Wed, 19 Nov 2025 15:03:17 +0100 Oleksij Rempel wrote:
> > > > + * @get_pauseparam: Report the configured policy for link-wide PAUSE
> > > > + * (IEEE 802.3 Annex 31B). Drivers must fill struct ethtool_pauseparam
> > > > + * such that:
> > > > + * @autoneg:
> > > > + * This refers to **Pause Autoneg** (IEEE 802.3 Annex 31B) only
> > > > + * and is part of the link autonegotiation process.
> > > > + * true -> the device follows the negotiated result of pause
> > > > + * autonegotiation (Pause/Asym);
> > > > + * false -> the device uses a forced MAC state independent of
> > > > + * negotiation.
> > > > + * @rx_pause/@...pause:
> > > > + * represent the desired policy (preferred configuration).
> > > > + * In autoneg mode they describe what is to be advertised;
> > > > + * in forced mode they describe the MAC state to apply.
> > >
> > > How is the user supposed to know what ended up getting configured?
> >
> > My current understanding is that get_pauseparam() is mainly a
> > configuration API. It seems to be designed symmetric to
> > set_pauseparam(): it reports the requested policy (autoneg flag and
> > rx/tx pause), not the resolved MAC state.
> >
> > In autoneg mode this means the user sees what we intend to advertise
> > or force, but not necessarily what the MAC actually ended up with
> > after resolution.
> >
> > The ethtool userspace tool tries to fill this gap by showing
> > "RX negotiated" and "TX negotiated" fields, for example:
> >
> > Pause parameters for lan1:
> > Autonegotiate: on
> > RX: off
> > TX: off
> > RX negotiated: on
> > TX negotiated: on
> >
> > As far as I can see, these "negotiated" values are not read from hardware or
> > kernel. They are guessed in userspace from the local and link partner
> > advertisements, assuming that the kernel follows the same pause resolution
> > rules as ethtool does. If the kernel or hardware behaves differently, these
> > values can be wrong.
> >
> > So, with the current API, the user gets:
> > - the configured policy via get_pauseparam(), and
> > - an ethtool-side guess of the resolved state via
> > "RX negotiated"/"TX negotiated",
>
> Again, that's all well and good for autoneg, but in DC use cases with
> integrated NICs autoneg is usually off. And in that case having get
> report "desired" config of some sort makes much less sense, when we also
> recommend that drivers reject unsupported configurations.
>
> > > Why do we need to configure autoneg via this API and not link modes directly?
> >
> > I am not aware of a clear reason. This documentation aims to describe
> > the current behavior and capture the rationale of the existing API.
>
> To spell it out more forcefully I think it describes the current
> behavior for certain devices. I could be wrong but the expectations
> for when autoneg is off should be different.
>
> > Configuring it via link modes directly would likely resolve some of this
> > confusion, but for now we focus on documenting how the current API is
> > expected to behave.
>
> You say current API - is setting Pause and Asym_Pause via link modes
> today rejected? I don't see an explicit check by grepping but I haven't
> really tried..
Haw about following wording:
Kernel Policy: Administrative vs. Operational State
===================================================
The ethtool pause API configures the **administrative state** of the network
device. The **operational state** (the actual pause behavior active on the
wire) depends on the active link mode and the link partner.
The semantics of the configuration depend on the ``autoneg`` parameter:
1. **Autonegotiation Mode** (``autoneg on``)
In this mode, the ``rx`` and ``tx`` parameters specify the **advertisement**
(the "wish").
- The driver configures the PHY to advertise these capabilities.
- The actual Flow Control mode is determined by the standard resolution
truth table (see "Link-wide PAUSE Autonegotiation Details") based on the
link partner's advertisement.
- ``get_pauseparam`` reports the advertisement policy, not the resolved
outcome.
2. **Forced Mode** (``autoneg off``)
In this mode, the ``rx`` and ``tx`` parameters constitute a direct
**command** to the interface.
- The system bypasses advertisement and forces the MAC into the specified
configuration.
- Drivers should reject configurations that the hardware cannot support in
forced mode.
- ``get_pauseparam`` reports the forced configuration.
**Common Constraints**
Regardless of the mode, the following constraints apply:
- Link-wide PAUSE is not valid on half-duplex links.
- Link-wide PAUSE cannot be used together with Priority-based Flow Control
(PFC).
/**
* ...
* @get_pauseparam: Report the configured administrative policy for link-wide
* PAUSE (IEEE 802.3 Annex 31B). Drivers must fill struct ethtool_pauseparam
* such that:
* @autoneg:
* This refers to **Pause Autoneg** (IEEE 802.3 Annex 31B) only
* and is part of the link autonegotiation process.
* true -> the device follows the negotiated result of pause
* autonegotiation (Pause/Asym);
* false -> the device uses a forced configuration independent
* of negotiation.
* @rx_pause/@...pause:
* represent the desired policy (administrative state).
* In autoneg mode they describe what is to be advertised;
* in forced mode they describe the MAC configuration to be forced.
*
* @set_pauseparam: Apply a policy for link-wide PAUSE (IEEE 802.3 Annex 31B).
* @rx_pause/@...pause:
* Desired state. If @autoneg is true, these define the
* advertisement. If @autoneg is false, these define the
* forced MAC configuration.
* @autoneg:
* Select autonegotiation or forced mode.
*
* **Constraint Checking:**
* Drivers should reject a non-zero setting of @autoneg when
* autonegotiation is disabled (or not supported) for the link.
* Drivers should reject unsupported rx/tx combinations with -EINVAL.
* ...
*/
Open Questions:
Pre-link Configuration (Administrative UP, Physical DOWN) How should drivers
handle set_pauseparam when the link is physically down?
Fully Forced: If speed/duplex are forced, we can validate the pause request
immediately.
Parallel Detection: If the link comes up later (e.g., as Half Duplex via
parallel detection), a previously accepted "forced pause" configuration might
become invalid. Should we block forced pause settings until the link is
physically up?
State Persistence and Toggling When toggling autoneg (e.g., autoneg on -> off
-> on), should the kernel or driver cache the previous advertisement?
Currently, if a user switches to forced mode and back, the previous
advertisement preferences might be lost or reset to defaults depending on the
driver.
Similarly, if no administrative configuration has ever been set, what should
get_pauseparam report? Should it read the current hardware state (which might
be default) or return zero/empty?
Synchronization with Link Modes Configuring pause via set_pauseparam vs.
link_ksettings can lead to desynchronization.
My testing shows that set_pauseparam often updates the driver's internal
pause state but may not trigger the necessary link reset/re-advertisement
that link_ksettings does.
This results in the reported "Advertised" pause modes in ethtool output being
out of sync with the actual Pause API settings.
Combining configuration over different interfaces sometimes will avoid
link reset, so new configuration is not advertised.
--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
Powered by blists - more mailing lists