[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z-PQbyKj1CBdqIQh@pengutronix.de>
Date: Wed, 26 Mar 2025 11:01:19 +0100
From: Oleksij Rempel <o.rempel@...gutronix.de>
To: Kyle Swenson <kyle.swenson@....tech>
Cc: Kory Maincent <kory.maincent@...tlin.com>, Andrew Lunn <andrew@...n.ch>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Jonathan Corbet <corbet@....net>,
Donald Hunter <donald.hunter@...il.com>,
Rob Herring <robh@...nel.org>, Andrew Lunn <andrew+netdev@...n.ch>,
Simon Horman <horms@...nel.org>,
Heiner Kallweit <hkallweit1@...il.com>,
Russell King <linux@...linux.org.uk>,
Krzysztof Kozlowski <krzk+dt@...nel.org>,
Conor Dooley <conor+dt@...nel.org>,
Liam Girdwood <lgirdwood@...il.com>,
Mark Brown <broonie@...nel.org>,
Thomas Petazzoni <thomas.petazzoni@...tlin.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>,
Dent Project <dentproject@...uxfoundation.org>,
"kernel@...gutronix.de" <kernel@...gutronix.de>,
Maxime Chevallier <maxime.chevallier@...tlin.com>,
"devicetree@...r.kernel.org" <devicetree@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH net-next v6 06/12] net: pse-pd: Add support for budget
evaluation strategies
Hi folks,
On Tue, Mar 25, 2025 at 08:40:54PM +0000, Kyle Swenson wrote:
> Hello Kory,
>
> On Tue, Mar 25, 2025 at 04:25:34PM +0100, Kory Maincent wrote:
> > On Tue, 25 Mar 2025 06:34:17 +0100
> > Oleksij Rempel <o.rempel@...gutronix.de> wrote:
> >
> > > Hi,
> > >
> > > On Mon, Mar 24, 2025 at 05:33:18PM +0000, Kyle Swenson wrote:
> > > > Hello Kory,
> > > >
> > > > On Mon, Mar 24, 2025 at 05:39:07PM +0100, Kory Maincent wrote:
> > > > > Hello Kyle, Oleksij,
> > > > ...
> > > > >
> > > > > Small question on PSE core behavior for PoE users.
> > > > >
> > > > > If we want to enable a port but we can't due to over budget.
> > > > > Should we :
> > > > > - Report an error (or not) and save the enable action from userspace. On
> > > > > that case, if enough budget is available later due to priority change or
> > > > > port disconnected the PSE core will try automatically to re enable the
> > > > > PoE port. The port will then be enabled without any action from the user.
> > > > > - Report an error but do nothing. The user will need to rerun the enable
> > > > > command later to try to enable the port again.
> > > > >
> > > > > How is it currently managed in PoE poprietary userspace tools?
> > > >
> > > > So in our implementation, we're using the first option you've presented.
> > > > That is, we save the enable action from the user and if we can't power
> > > > the device due to insufficient budget remaining, we'll indicate that status
> > > > to the user. If enough power budget becomes available later, we'll power up
> > > > the device automatically.
> > >
> > > It seems to be similar to administrative UP state - "ip link set dev lan1 up".
> > > I'm ok with this behavior.
> >
> > Ack I will go for it then, thank you!
> >
> > Other question to both of you:
> > If we configure manually the current limit for a port. Then we plug a Powered
> > Device and we detect (during the classification) a smaller current limit
> > supported. Should we change the current limit to the one detected. On that case
> > we should not let the user set a power limit greater than the one detected after
> > the PD has been plugged.
>
> I don't know that we want to prevent the user from setting a higher
> current than a device's classification current because that would
> prevent the PD and PSE negotiating a higher current via LLDP.
>
> That said, I'm struggling to think of a use-case where the user would be
> setting a current limit before a PD is connected, so maybe we can reset
> the current limit when the PD is classified to the classification
> result, but also allow it to be adjusted after a PD is powered for the
> LLDP negotiation case.
>
> In our implementation, don't really let the user specify something like,
> "Only class 3 and lower devices on this port" because we've not seen
> customers need this. We have, however, implemented the LLDP negotiation
> support after several requests from customers, but this only makes sense
> when a PD is powered at it's initial classification result. The PD can
> then request more power (via LLDP) and then we adjust the current limit
> assuming the system has budget available for the request.
>
> >
> > What do you think? Could we let a user burn a PD?
>
> This seems like a very rare case, and if the PD is designed such that
> it's reliant on the PSE's current limiting ability then seems like it's
> just an accident waiting to happen with any PSE.
>
> Very rarely have we seen a device actually pull more current than it's
> classification result allows (except for LLDP negotiation). What's more
> likely is a dual-channel 802.3bt device is incorrectly classified as a
> single-channel 802.3at device; the device pulls more current than
> allocated and gets shut off promptly, but no magic smoke escaped.
Here’s my understanding of the use cases described so far, and a proposal for
how we could handle them in the kernel to avoid conflicts between different
actors.
We have multiple components that may affect power delivery:
- The kernel, which reacts to detection and classification
- The admin, who might want to override or restrict power for policy or safety reasons
- The LLDP daemon, which may request more power dynamically based on what the PD asks for
To avoid races and make things more predictable, I think it's best if each
actor has its own dedicated input.
## Use Cases
### Use Case 1: Classification-based power (default behavior)
- Kernel detects PD and performs classification
- Power is applied according to classification and hardware limits
- No override used
Steps:
1. Detection runs
2. Classification result obtained (e.g. Class 2 → 7W)
3. Kernel computes:
effective_limit = min(
classification_result,
controller_capability,
board_limit,
dynamic_budget
)
4. Power applied up to `effective_limit`
### Use Case 2: Admin-configured upper bound (non-override)
- Admin sets a policy limit that restricts all power delivery
- Does not override classification, only bounds it
Steps:
1. Admin sets `ETHTOOL_A_C33_PSE_AVAIL_PWR_LIMIT = 15000`
2. Detection + classification run normally
3. Kernel computes:
effective_limit = min(
classification_result,
AVAIL_PWR_LIMIT,
controller_capability,
board_limit,
dynamic_budget
)
4. Classification is respected, but never exceeds admin limit
This value is always included in power computation — even if classification
or LLDP overrides are active.
### Use Case 3: Persistent classification override (admin)
- Admin sets a persistent limit that overrides classification
- Power is always based on this override
Steps:
1. Admin sets `CLASS_OVERRIDE_PERSISTENT = 25000` (mW)
2. Detection/classification may run, but classification result is ignored
3. Kernel computes:
effective_limit = min(
CLASS_OVERRIDE_PERSISTENT,
AVAIL_PWR_LIMIT,
controller_capability,
board_limit,
dynamic_budget
)
4. Power applied accordingly
5. Override persists until cleared
### Use Case 4: Temporary classification override (LLDP)
- LLDP daemon overrides classification for current PD session only
- Cleared automatically on PD disconnect
Steps:
1. PD connects, detection + classification runs (e.g. 7W)
2. LLDP daemon receives PD request for 25000 mW
3. LLDP daemon sets `CLASS_OVERRIDE_TEMPORARY = 25000`
4. Kernel computes:
effective_limit = min(
CLASS_OVERRIDE_TEMPORARY,
AVAIL_PWR_LIMIT,
controller_capability,
board_limit,
dynamic_budget
)
5. Power is increased for this session
6. On PD disconnect, override is cleared automatically
---
### Use Case 5: Ignore detection and classification (force-on)
- Admin forces the port on, ignoring detection
- Useful for passive/non-802.3 devices or bring-up
Steps:
1. Admin sets:
- `DETECTION_IGNORE = true`
- `CLASS_OVERRIDE_PERSISTENT = 5000`
2. Kernel skips detection and classification
3. Kernel computes:
effective_limit = min(
CLASS_OVERRIDE_PERSISTENT,
AVAIL_PWR_LIMIT,
controller_capability,
board_limit,
dynamic_budget
)
4. Power is applied immediately
## Proposed kernel UAPI
### SET attributes (configuration input)
| Attribute | Type | Lifetime | Owner | Description |
|-------------------------------------------|----------|------------------------|------------------|-------------|
| `ETHTOOL_A_PSE_CLASS_OVERRIDE_PERSISTENT` | u32 (mW) | Until cleared | Admin | Persistent classification override |
| `ETHTOOL_A_PSE_CLASS_OVERRIDE_TEMPORARY` | u32 (mW) | Cleared on detection failure / PD replug | LLDP daemon / test tool | Temporary override of classification |
| `ETHTOOL_A_PSE_DETECTION_IGNORE` | bool | Until cleared | Admin | Ignore detection phase |
| `ETHTOOL_A_C33_PSE_AVAIL_PWR_LIMIT` | u32 (mW) | Until changed | Admin | Static admin-defined max power cap (non-override) |
### GET attributes (status and diagnostics)
| Attribute | Type | Description |
|--------------------------------------------|----------|-------------|
| `ETHTOOL_A_PSE_EFFECTIVE_PWR_LIMIT` | u32 (mW) | Final power limit applied by kernel |
| `ETHTOOL_A_PSE_CLASS_OVERRIDE_PERSISTENT` | u32 (mW) | Current persistent override (if set) |
| `ETHTOOL_A_PSE_CLASS_OVERRIDE_TEMPORARY` | u32 (mW) | Current temporary override (if active) |
| `ETHTOOL_A_PSE_DETECTION_IGNORE` | bool | Current detection ignore state |
### Power Limit Priority
Since we now have multiple sources that can influence how much power is
delivered to a PD, we need to define a clear and deterministic priority
order for all these values. This avoids confusion and ensures that the kernel
behaves consistently, even when different actors (e.g. admin, LLDP daemon,
hardware limits) are active at the same time.
Below is the proposed priority list — values higher in the list take precedence
over those below:
| Priority | Source / Field | Description |
|----------|------------------------------------------|-------------|
| 1 | Hardware/board-specific limit | Maximum allowed by controller or board design (e.g. via device tree or driver constraints) |
| 2 | Dynamic power budget | Current system-level or PSE-level power availability (shared with other ports) |
| 3 | `ETHTOOL_A_C33_PSE_AVAIL_PWR_LIMIT` | Admin-configured upper bound — applies even when classification or override is used |
| 4 | `ETHTOOL_A_PSE_CLASS_OVERRIDE_TEMPORARY` | Temporary override, e.g. set by LLDP daemon, cleared on PD disconnect or detection loss |
| 5 | `ETHTOOL_A_PSE_CLASS_OVERRIDE_PERSISTENT` | Admin override that persists until cleared |
| 6 | `ETHTOOL_A_PSE_CLASSIFICATION_RESULT` | Result of PD classification, used when no override is present |
The effective power limit used by the kernel will always be the minimum of the
values above.
This way, even if the LLDP daemon requests more power, or classification result
is high, power delivery will still be constrained by admin policies, hardware
limits, and current budget.
Best regards,
Oleksij
--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
Powered by blists - more mailing lists