[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2627546.1750980515@famine>
Date: Thu, 26 Jun 2025 16:28:35 -0700
From: Jay Vosburgh <jv@...sburgh.net>
To: Hangbin Liu <liuhangbin@...il.com>
cc: Andrew Lunn <andrew+netdev@...n.ch>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>, Jiri Pirko <jiri@...nulli.us>,
netdev@...r.kernel.org
Subject: Re: [Bonding Draft Proposal] Add lacp_prio Support for ad_select?
Hangbin Liu <liuhangbin@...il.com> wrote:
>Hi Jay,
>
>We have a customer setup involving two separate switches with identical
>L2/VLAN configurations. Each switch forms an independent aggregator
>(port-channel), and the end host connects to both with the same number of
>links and equivalent bandwidth.
>
>As a result, the host ends up with two aggregators under a single bond
>interface. Since the user cannot arbitrarily override port count or
>bandwidth, they are asking for a new mechanism, lacp_prio, to influence
>aggregator selection via ad_select.
>
>Do you think this is a reasonable addition?
In principle, I don't see a reason not to use the system
priority, et al, to influence the aggregator selection when bonding ends
up with multiple aggregators. I'm undecided as to whether it should be
a separate ad_select policy or a "tiebreaker," but a separate policy is
probably simpler to deal with.
>If yes, what would be the best way to compare priorities?
>
>1. Port Priority Only. Currently initialized to 0xff. We could add a parameter
> allowing users to configure it.
> a) Use the highest port priority within each aggregator for comparison
> b) Sum all port priorities in each aggregator and compare the totals
I'm not a fan of this, as explained below.
Also, note that in LACP-land, when comparing priorities, the
higher priority is numerically smaller, which makes "add them up and
compare" a little counter intuitive to me.
>2. Full LACP Info Comparison. Compare fields such as system_priority, system,
> port_priority, port_id, etc.
I think it makes more sense to use the System ID (system
priority and aggregator MAC address) from the LAG ID of the local
aggregator. In the bonding implementation, an aggregator is assigned a
MAC when an interface is added, so the only aggregators lacking a MAC
are ones that have no ports (which can't be active).
If we want to use the partner System ID, that's a little more
complicated. If aggregators in question both have LACP partners, then
the System IDs will be unique, since the MAC addresses will differ. If
the aggregators don't have LACP partners, then they'll be individual
ports, and the partner information won't be available.
Modulo the fact that bonding assigns a MAC to an aggregator
before the standard does (for the System ID), this is approximately
what's described in 802.1AX-2014 6.7.1, although the context there is
criteria for prioritizing between ports during selection for aggregation
when limited capabilities exist (i.e., 6 ports but only the ability to
accomodate 4 in an aggregrator).
FWIW, the 802.1AX standard is pretty quiet on this whole
situation. It recognises that "A System may contain multiple
Aggregators, serving multiple Aggregator Clients" (802.1AX-2014 6.2.1)
but doesn't have any verbiage that I can find on requirements for
choosing between multiple such Aggregators if only one can be active. I
think the presumption in the standard is that the multiple aggregators
would or could be active simultaneously as independent entities.
Anyway, the upshot is that we can pretty much choose as we see
fit for this particular case.
>At present, the libteam code has implemented an approach that selects the
>aggregator based on the highest system_priority/system from both local and
>partner data.[1]
Just looking at libteam code, it's more complicated than that.
The documentation says "Aggregator with highest priority according to
LACP standard will be selected," and the code looks to be doing memcmp()
on
struct lacpdu_info {
uint16_t system_priority;
uint8_t system[ETH_ALEN]; /* ID */
uint16_t key;
uint16_t port_priority;
uint16_t port; /* ID */
uint8_t state;
} __attribute__((__packed__));
which is essentially the local portion of a LAG ID per
802.1AX-2014 6.3.6.1, with the key and state set to zero for the
comparison. Also, the function you reference, get_lacp_port_prio_info,
is choosing between the actor and partner information for reasons that
aren't explained in the code, but I suspect might to comply with the
statement in 6.3.6.1:
"To simplify comparison of LAG IDs it is conventional to order
these such that S is the numerically smaller of S and T."
where S and T are System Identifiers (comprised of System
Priority and the MAC address), or perhaps 6.7.1, as described above.
Anyway, without knowing exactly why the team function is doing
what it does, I'm not sure if that's the proper algorithm to use. Jiri
is on Cc, maybe he remembers.
-J
>Looking forward to your thoughts.
>
>Best regards,
>Hangbin
>
>[1] https://github.com/jpirko/libteam/blob/master/teamd/teamd_runner_lacp.c#L402
---
-Jay Vosburgh, jv@...sburgh.net
Powered by blists - more mailing lists