[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250326153545.6f1b16ab@kmaincent-XPS-13-7390>
Date: Wed, 26 Mar 2025 15:35:45 +0100
From: Kory Maincent <kory.maincent@...tlin.com>
To: Oleksij Rempel <o.rempel@...gutronix.de>
Cc: Kyle Swenson <kyle.swenson@....tech>, Andrew Lunn <andrew@...n.ch>,
"David S. Miller" <davem@...emloft.net>, Eric Dumazet
<edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>, Paolo Abeni
<pabeni@...hat.com>, Jonathan Corbet <corbet@....net>, Donald Hunter
<donald.hunter@...il.com>, Rob Herring <robh@...nel.org>, Andrew Lunn
<andrew+netdev@...n.ch>, Simon Horman <horms@...nel.org>, Heiner Kallweit
<hkallweit1@...il.com>, Russell King <linux@...linux.org.uk>, Krzysztof
Kozlowski <krzk+dt@...nel.org>, Conor Dooley <conor+dt@...nel.org>, Liam
Girdwood <lgirdwood@...il.com>, Mark Brown <broonie@...nel.org>, Thomas
Petazzoni <thomas.petazzoni@...tlin.com>, "netdev@...r.kernel.org"
<netdev@...r.kernel.org>, "linux-doc@...r.kernel.org"
<linux-doc@...r.kernel.org>, Dent Project
<dentproject@...uxfoundation.org>, "kernel@...gutronix.de"
<kernel@...gutronix.de>, Maxime Chevallier <maxime.chevallier@...tlin.com>,
"devicetree@...r.kernel.org" <devicetree@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH net-next v6 06/12] net: pse-pd: Add support for budget
evaluation strategies
On Wed, 26 Mar 2025 11:01:19 +0100
Oleksij Rempel <o.rempel@...gutronix.de> wrote:
> Hi folks,
>
> On Tue, Mar 25, 2025 at 08:40:54PM +0000, Kyle Swenson wrote:
> > Hello Kory,
> >
> > On Tue, Mar 25, 2025 at 04:25:34PM +0100, Kory Maincent wrote:
> > > On Tue, 25 Mar 2025 06:34:17 +0100
> > > Oleksij Rempel <o.rempel@...gutronix.de> wrote:
> > >
> [...]
> [...]
> [...]
> [...]
> [...]
> [...]
> [...]
> > >
> > > Ack I will go for it then, thank you!
> > >
> > > Other question to both of you:
> > > If we configure manually the current limit for a port. Then we plug a
> > > Powered Device and we detect (during the classification) a smaller
> > > current limit supported. Should we change the current limit to the one
> > > detected. On that case we should not let the user set a power limit
> > > greater than the one detected after the PD has been plugged.
> >
> > I don't know that we want to prevent the user from setting a higher
> > current than a device's classification current because that would
> > prevent the PD and PSE negotiating a higher current via LLDP.
> >
> > That said, I'm struggling to think of a use-case where the user would be
> > setting a current limit before a PD is connected, so maybe we can reset
> > the current limit when the PD is classified to the classification
> > result, but also allow it to be adjusted after a PD is powered for the
> > LLDP negotiation case.
> >
> > In our implementation, don't really let the user specify something like,
> > "Only class 3 and lower devices on this port" because we've not seen
> > customers need this. We have, however, implemented the LLDP negotiation
> > support after several requests from customers, but this only makes sense
> > when a PD is powered at it's initial classification result. The PD can
> > then request more power (via LLDP) and then we adjust the current limit
> > assuming the system has budget available for the request.
> >
> > >
> > > What do you think? Could we let a user burn a PD?
> >
> > This seems like a very rare case, and if the PD is designed such that
> > it's reliant on the PSE's current limiting ability then seems like it's
> > just an accident waiting to happen with any PSE.
> >
> > Very rarely have we seen a device actually pull more current than it's
> > classification result allows (except for LLDP negotiation). What's more
> > likely is a dual-channel 802.3bt device is incorrectly classified as a
> > single-channel 802.3at device; the device pulls more current than
> > allocated and gets shut off promptly, but no magic smoke escaped.
>
> Here’s my understanding of the use cases described so far, and a proposal for
> how we could handle them in the kernel to avoid conflicts between different
> actors.
>
> We have multiple components that may affect power delivery:
> - The kernel, which reacts to detection and classification
> - The admin, who might want to override or restrict power for policy or
> safety reasons
> - The LLDP daemon, which may request more power dynamically based on what the
> PD asks for
>
> To avoid races and make things more predictable, I think it's best if each
> actor has its own dedicated input.
>
> ## Use Cases
>
> ### Use Case 1: Classification-based power (default behavior)
> - Kernel detects PD and performs classification
> - Power is applied according to classification and hardware limits
> - No override used
>
> Steps:
> 1. Detection runs
> 2. Classification result obtained (e.g. Class 2 → 7W)
> 3. Kernel computes:
>
> effective_limit = min(
> classification_result,
> controller_capability,
> board_limit,
> dynamic_budget
> )
>
> 4. Power applied up to `effective_limit`
>
> ### Use Case 2: Admin-configured upper bound (non-override)
> - Admin sets a policy limit that restricts all power delivery
> - Does not override classification, only bounds it
>
> Steps:
> 1. Admin sets `ETHTOOL_A_C33_PSE_AVAIL_PWR_LIMIT = 15000`
> 2. Detection + classification run normally
> 3. Kernel computes:
>
> effective_limit = min(
> classification_result,
> AVAIL_PWR_LIMIT,
> controller_capability,
> board_limit,
> dynamic_budget
> )
>
> 4. Classification is respected, but never exceeds admin limit
>
> This value is always included in power computation — even if classification
> or LLDP overrides are active.
>
> ### Use Case 3: Persistent classification override (admin)
> - Admin sets a persistent limit that overrides classification
> - Power is always based on this override
>
> Steps:
> 1. Admin sets `CLASS_OVERRIDE_PERSISTENT = 25000` (mW)
> 2. Detection/classification may run, but classification result is ignored
> 3. Kernel computes:
>
> effective_limit = min(
> CLASS_OVERRIDE_PERSISTENT,
> AVAIL_PWR_LIMIT,
> controller_capability,
> board_limit,
> dynamic_budget
> )
>
> 4. Power applied accordingly
> 5. Override persists until cleared
>
> ### Use Case 4: Temporary classification override (LLDP)
> - LLDP daemon overrides classification for current PD session only
> - Cleared automatically on PD disconnect
>
> Steps:
> 1. PD connects, detection + classification runs (e.g. 7W)
> 2. LLDP daemon receives PD request for 25000 mW
> 3. LLDP daemon sets `CLASS_OVERRIDE_TEMPORARY = 25000`
> 4. Kernel computes:
>
> effective_limit = min(
> CLASS_OVERRIDE_TEMPORARY,
> AVAIL_PWR_LIMIT,
> controller_capability,
> board_limit,
> dynamic_budget
> )
>
> 5. Power is increased for this session
> 6. On PD disconnect, override is cleared automatically
>
> ---
>
> ### Use Case 5: Ignore detection and classification (force-on)
> - Admin forces the port on, ignoring detection
> - Useful for passive/non-802.3 devices or bring-up
>
> Steps:
> 1. Admin sets:
> - `DETECTION_IGNORE = true`
> - `CLASS_OVERRIDE_PERSISTENT = 5000`
> 2. Kernel skips detection and classification
> 3. Kernel computes:
>
> effective_limit = min(
> CLASS_OVERRIDE_PERSISTENT,
> AVAIL_PWR_LIMIT,
> controller_capability,
> board_limit,
> dynamic_budget
> )
>
> 4. Power is applied immediately
>
> ## Proposed kernel UAPI
>
> ### SET attributes (configuration input)
>
> | Attribute | Type | Lifetime
> | Owner | Description |
> |-------------------------------------------|----------|------------------------|------------------|-------------|
> | `ETHTOOL_A_PSE_CLASS_OVERRIDE_PERSISTENT` | u32 (mW) | Until cleared
> | Admin | Persistent classification override | |
> `ETHTOOL_A_PSE_CLASS_OVERRIDE_TEMPORARY` | u32 (mW) | Cleared on detection
> failure / PD replug | LLDP daemon / test tool | Temporary override of
> classification | | `ETHTOOL_A_PSE_DETECTION_IGNORE` | bool |
> Until cleared | Admin | Ignore detection phase | |
> `ETHTOOL_A_C33_PSE_AVAIL_PWR_LIMIT` | u32 (mW) | Until changed
> | Admin | Static admin-defined max power cap (non-override) |
>
> ### GET attributes (status and diagnostics)
>
> | Attribute | Type | Description |
> |--------------------------------------------|----------|-------------|
> | `ETHTOOL_A_PSE_EFFECTIVE_PWR_LIMIT` | u32 (mW) | Final power limit
> applied by kernel | | `ETHTOOL_A_PSE_CLASS_OVERRIDE_PERSISTENT` | u32 (mW) |
> Current persistent override (if set) | |
> `ETHTOOL_A_PSE_CLASS_OVERRIDE_TEMPORARY` | u32 (mW) | Current temporary
> override (if active) | | `ETHTOOL_A_PSE_DETECTION_IGNORE` | bool
> | Current detection ignore state |
>
> ### Power Limit Priority
>
> Since we now have multiple sources that can influence how much power is
> delivered to a PD, we need to define a clear and deterministic priority
> order for all these values. This avoids confusion and ensures that the kernel
> behaves consistently, even when different actors (e.g. admin, LLDP daemon,
> hardware limits) are active at the same time.
>
> Below is the proposed priority list — values higher in the list take
> precedence over those below:
>
> | Priority | Source / Field | Description |
> |----------|------------------------------------------|-------------|
> | 1 | Hardware/board-specific limit | Maximum allowed by
> controller or board design (e.g. via device tree or driver constraints) | | 2
> | Dynamic power budget | Current system-level or
> PSE-level power availability (shared with other ports) | | 3 |
> `ETHTOOL_A_C33_PSE_AVAIL_PWR_LIMIT` | Admin-configured upper bound —
> applies even when classification or override is used | | 4 |
> `ETHTOOL_A_PSE_CLASS_OVERRIDE_TEMPORARY` | Temporary override, e.g. set by
> LLDP daemon, cleared on PD disconnect or detection loss | | 5 |
> `ETHTOOL_A_PSE_CLASS_OVERRIDE_PERSISTENT` | Admin override that persists
> until cleared | | 6 | `ETHTOOL_A_PSE_CLASSIFICATION_RESULT` |
> Result of PD classification, used when no override is present |
>
> The effective power limit used by the kernel will always be the minimum of the
> values above.
>
> This way, even if the LLDP daemon requests more power, or classification
> result is high, power delivery will still be constrained by admin policies,
> hardware limits, and current budget.
Kyle thanks for your PoE user side answer!
Oleksij, thanks, as usual you have done an intense brainstorm! ^^
These two replies could be helpful in the future.
I asked this because I found out that the over current event of the TPS23881
was resetting the power limit register of the port. I think I will simply
fix it by reconfiguring the power limit if this event happens.
So it won't change the current behavior where the user setting of power limit
prevail over the power limit detected during classification.
Regards,
--
Köry Maincent, Bootlin
Embedded Linux and kernel engineering
https://bootlin.com
Powered by blists - more mailing lists