netdev - Re: [Bonding Draft Proposal] Add lacp_prio Support for ad

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2946319.1751389736@famine>
Date: Tue, 01 Jul 2025 10:08:56 -0700
From: Jay Vosburgh <jv@...sburgh.net>
To: Hangbin Liu <liuhangbin@...il.com>
cc: Andrew Lunn <andrew+netdev@...n.ch>,
    "David S. Miller" <davem@...emloft.net>,
    Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
    Paolo Abeni <pabeni@...hat.com>, Jiri Pirko <jiri@...nulli.us>,
    netdev@...r.kernel.org
Subject: Re: [Bonding Draft Proposal] Add lacp_prio Support for ad_select?

Hangbin Liu <liuhangbin@...il.com> wrote:

>On Thu, Jun 26, 2025 at 04:28:35PM -0700, Jay Vosburgh wrote:
>> Hangbin Liu <liuhangbin@...il.com> wrote:
>> 
>> >Hi Jay,
>> >
>> >We have a customer setup involving two separate switches with identical
>> >L2/VLAN configurations. Each switch forms an independent aggregator
>> >(port-channel), and the end host connects to both with the same number of
>> >links and equivalent bandwidth.
>> >
>> >As a result, the host ends up with two aggregators under a single bond
>> >interface. Since the user cannot arbitrarily override port count or
>> >bandwidth, they are asking for a new mechanism, lacp_prio, to influence
>> >aggregator selection via ad_select.
>> >
>> >Do you think this is a reasonable addition?
>> 
>> 	In principle, I don't see a reason not to use the system
>> priority, et al, to influence the aggregator selection when bonding ends
>> up with multiple aggregators.  I'm undecided as to whether it should be
>> a separate ad_select policy or a "tiebreaker," but a separate policy is
>> probably simpler to deal with.
>
>There is only one system priority in the bond, which means all aggregators
>share the same system priority — right?
>
>Or do you mean we should also take the partner's system priority into account?

	Ok, this is what I get for reading the standard and not checking
the code carefully while I do it.

	That said, I still think that:

	(a) the system priority is the logical fit for this purpose, at
least from what the standard seems to intend, given that "System" there
means "Aggregation System," and,

	(b) the implementation we have (as well as few switches I
checked) is totally impractical for using system priority in this
manner, as all of them implement it as a global, or at least not on a
per-Aggregation System basis.

	Granted, the switches I have are pretty old, but even those that
require ports to be explicitly configured for particular channel groups
don't permit setting the system priority on a per-channel group (i.e.,
Aggregation System) basis.

>> >If yes, what would be the best way to compare priorities?
>> >
>> >1. Port Priority Only. Currently initialized to 0xff. We could add a parameter
>> >   allowing users to configure it.
>> >   a) Use the highest port priority within each aggregator for comparison
>> >   b) Sum all port priorities in each aggregator and compare the totals
>> 
>> 	I'm not a fan of this, as explained below.
>> 
>> 	Also, note that in LACP-land, when comparing priorities, the
>> higher priority is numerically smaller, which makes "add them up and
>> compare" a little counter intuitive to me.
>
>Yeah..
>
>> 
>> >2. Full LACP Info Comparison. Compare fields such as system_priority, system,
>> >   port_priority, port_id, etc.
>> 
>> 	I think it makes more sense to use the System ID (system
>> priority and aggregator MAC address) from the LAG ID of the local
>> aggregator.  In the bonding implementation, an aggregator is assigned a
>> MAC when an interface is added, so the only aggregators lacking a MAC
>> are ones that have no ports (which can't be active).
>
>Same question, the system priority and aggregator MAC address are all same
>in the same bonding interface. So how can we prioritize between two
>aggregators within the same bond?
>
>Unless we take the partner's System ID into account. Which looks like, if
>we want to choose a better aggregator in bond, we need to config the switch side...

	Again, moving on from my lack of paying complete attention,
looking at the teamd implementation, I think that's what it does:

static void get_lacp_port_prio_info(struct lacp_port *lacp_port,
				    struct lacpdu_info *prio_info)
{
	int prio_diff;
	int system_diff;

	prio_diff = ntohs(lacp_port->actor.system_priority) -
		    ntohs(lacp_port->partner.system_priority);
	system_diff = memcmp(lacp_port->actor.system,
			     lacp_port->partner.system, ETH_ALEN);
	if (prio_diff < 0 || (prio_diff == 0 && system_diff < 0))
		*prio_info = lacp_port->actor;
	if (prio_diff > 0 || (prio_diff == 0 && system_diff >= 0))
		*prio_info = lacp_port->partner;

	Right here, it chooses between returning the actor or partner
data based on comparisons of the actor and partner System Identifier,
returning the data for whichever is higher priority after comparing the
System Identifiers (which is the System Priority plus Partner System,
the latter of which is the MAC address, 802.1AX-2014 6.3.2).

	I would hazard to guess that Jiri did it this way for the same
reason we're having this conversation: there's not really a better way
without rearranging a lot of the innards of how configuration of this
stuff is done.

	It looks like lacp_find_new_agg_lead() runs though all of the
ports in all of the aggregators and chooses the aggregator with the
"best" port of all.

	One downside if we were to adapt this logic or something similar
to bonding is that there's currently no way to set the Port Priority of
interfaces in the bond.  There is a "prio" that can be set via ip set
... type bond_slave prio X, which is IFLA_BOND_SLAVE_PRIO, but that's a
failover priority, not the LACP Port Priority.

	So right now, if the above logic were put into bonding, the
local selection criteria would end up based entirely on the port number,
which isn't configurable, and so doesn't seem especially better than
what we have now.

>> 	If we want to use the partner System ID, that's a little more
>> complicated.  If aggregators in question both have LACP partners, then
>> the System IDs will be unique, since the MAC addresses will differ.  If
>> the aggregators don't have LACP partners, then they'll be individual
>> ports, and the partner information won't be available.
>
>Can we active a aggregator that don't have LACP partner? If not, then
>we don't need to compare that aggregator.

	Yes, we can.  If no ports have a LACP partner, then all of the
ports are "individual," which act like an aggregator with just one port.
If all ports in the bond are individual, then one is chosen to become
active.

	The rationale behind this is to permit LACP-unaware hosts or
things like PXE to be able to communicate when LACP is not up and
running on the host system.

>> 	Modulo the fact that bonding assigns a MAC to an aggregator
>> before the standard does (for the System ID), this is approximately
>> what's described in 802.1AX-2014 6.7.1, although the context there is
>> criteria for prioritizing between ports during selection for aggregation
>> when limited capabilities exist (i.e., 6 ports but only the ability to
>> accomodate 4 in an aggregrator).
>> 
>> 	FWIW, the 802.1AX standard is pretty quiet on this whole
>> situation.  It recognises that "A System may contain multiple
>> Aggregators, serving multiple Aggregator Clients" (802.1AX-2014 6.2.1)
>> but doesn't have any verbiage that I can find on requirements for
>> choosing between multiple such Aggregators if only one can be active.  I
>> think the presumption in the standard is that the multiple aggregators
>> would or could be active simultaneously as independent entities.
>> 
>> 	Anyway, the upshot is that we can pretty much choose as we see
>> fit for this particular case.
>
>Yes

	From the above, I suspect we'll have to add some additional
configuration parameters somewhere.  It would be nice if the System
Priority were configurable on a per-aggregator basis, but that seems
complicated from a UI perspective (other than something like a mapping
of agg ID to system prio).

	-J

---
	-Jay Vosburgh, jv@...sburgh.net