lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 11 May 2023 18:38:48 -0700
From: Jay Vosburgh <jay.vosburgh@...onical.com>
To: "Andrew J. Schorr" <aschorr@...emetry-investments.com>
cc: Hangbin Liu <liuhangbin@...il.com>, netdev@...r.kernel.org
Subject: Re: [Issue] Bonding can't show correct speed if lower interface is bond 802.3ad

Andrew J. Schorr <aschorr@...emetry-investments.com> wrote:

>Sorry -- resending from a different email address to fix a problem
>with gmail rejecting it.
>
>On Wed, May 10, 2023 at 12:57:38PM -0400, Andrew J. Schorr wrote:
>> Hi Hangbin & Jay,
>> 
>> On Wed, May 10, 2023 at 03:50:34PM +0800, Hangbin Liu wrote:
>> > On Mon, May 08, 2023 at 11:32:16AM -0700, Jay Vosburgh wrote:
>> > > 	That case should work fine without the active-backup.  LACP has
>> > > a concept of an "individual" port, which (in this context) would be the
>> > > "normal NIC," presuming that that means its link peer isn't running
>> > > LACP.
>> > > 
>> > > 	If all of the ports (N that are LACP to a single switch, plus 1
>> > > that's the non-LACP "normal NIC") were attached to a single bond, it
>> > > would create one aggregator with the LACP enabled ports, and then a
>> > > separate aggregator for the indvidual port that's not.  The aggregator
>> > > selection logic prefers the LACP enabled aggregator over the individual
>> > > port aggregator.  The precise criteria is in the commentary within
>> > > ad_agg_selection_test().
>> > > 
>> > 
>> > cc Andrew, He add active-backup bond over LACP bond because he want to
>> > use arp_ip_target to ensure that the target network is reachable...
>> 
>> That's correct. I prefer the ARP monitoring to ensure that the needed
>> connectivity is actually there instead of relying on MII monitoring.
>> 
>> I also confess that I was unaware of the possibility of using an individual
>> port inside an 802.3ad bond without having to stick that individual port into a
>> port-channel group with LACP enabled. I want to avoid enabling LACP on that
>> link because I'd like to be able to PXE boot over it, not to mention the switch
>> configuration hassle.  Is that individual port configuration without LACP
>> detected automatically by the kernel, or do I need to configure something to do
>> that? I see the logic in drivers/net/bonding/bond_3ad.c to set is_individual,
>> but it appears to depend on whether duplex is enabled. At that point, I got
>> lost, since I see duplex mentioned only in ad_user_port_key, and that seems to
>> be a property of the bond master, not the slaves. Is there any documentation of
>> how this configuration works?

	The individual port behavior is part of the LACP standard (IEEE
802.1AX, recent editions call this "Solitary"), and is done
automatically by the kernel.  One of the reasons for it is to permit
exactly the situation you mention: to enable PXE or "fallback"
communication to work even if LACP negotiation fails or is not
configured or implemented at one end.  This is called out explicitly in
802.1AX, 6.1.1.j.

	The duplex test is only part of the "individual" logic; it comes
up because LACP negotiation requires the peers to be point-to-point
links, i.e., full duplex (IEEE 802.1AX-2014, 6.4.8).  That's the norm
for most everything now, but historically a port in half duplex could be
on a multiple access topology, e.g., 802.3 CSMA/CD 10BASE2 on a coax
cable, which is incompatible with LACP aggregation.  This situation
doesn't come up a lot these days.

	The important part of the "individual" logic is whether or not
the port successfully completes LACP negotiation with a link partner.
If not, the port is an individual port, which acts essentially like an
aggregator with just one port in it.  This is separate from
"is_individual" in the bonding code, and happens in
ad_port_selection_logic(), after the comment "check if current
aggregator suits us".  "is_individual" is one element of this test, the
remaining tests compare the various keys and whether the partner MAC
address has been populated.

	As far as documentation goes, the bonding docs[0] describe some
of the parameters, but doesn't describe the specifics of bonding's
ability to manage multiple aggregators; I should write that up, since
this comes up periodically.  The IEEE standard (to which the bonding
implementation conforms) describes how the whole system works, but
doesn't really have a simple overview.

[0] https://www.kernel.org/doc/Documentation/networking/bonding.rst

>> But in any case, I still prefer active-backup on top of 802.3ad so that I can
>> have the ARP monitoring.
>> 
>> If it's too much trouble to get the top-level bond to report duplex/speed
>> correctly when the underlying bond speed changes, then I think it would
>> be an improvement to set duplex/speed to N/A (or -1) for a bond of
>> bonds configuration instead of potentially having incorrect information.
>> I imagine such a fix might be much easier than updating dynamically
>> when the lower-level 802.3ad bond changes speed.

	I'll have to give this some thought.  The best long term
solution would be to decouple the link monitoring stuff from the mode,
and thus allow ARP and MII in a wider variety of modes.  I've prototyped
that out in the past, along with changing the MII monitor to respond to
carrier state changes in real time instead of polling, and it's fairly
complicated.

	In any event, this does sound like a valid use case for nesting
the bonds, so simply disabling that facility seems to be off the table.

	-J

---
	-Jay Vosburgh, jay.vosburgh@...onical.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ