lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Zy18yA6kNmlCl6eQ@fedora>
Date: Fri, 8 Nov 2024 02:51:52 +0000
From: Hangbin Liu <liuhangbin@...il.com>
To: Jay Vosburgh <jv@...sburgh.net>
Cc: netdev@...r.kernel.org
Subject: Re: [Question]: should we consider arp missed max during
 bond_ab_arp_probe()?

On Thu, Nov 07, 2024 at 05:32:29PM -0800, Jay Vosburgh wrote:
> Hangbin Liu <liuhangbin@...il.com> wrote:
> 
> >Hi Jay,
> >
> >Our QE reported that, when there is no active slave during
> >bond_ab_arp_probe(), the slaves send the arp probe message one by one. This
> >will flap the switch's mac table quickly, sometimes even make the switch stop
> >learning mac address. So should we consider the arp missed max during
> >bond_ab_arp_probe()? i.e. each slave has more chances to send probe messages
> >before switch to another slave. What do you think?
> 
> 	Well, "quickly" here depends entirely on what the value of
> arp_interval is.  It's been quite a while since I looked into the
> details of this particular behavior, but at the time I didn't see the
> switches I had issue flap warnings.  If memory serves, I usually tested
> with arp_interval in the realm of 100ms, with anywhere from 2 to 6
> interfaces in the bond.
> 
> 	What settings are you using for the bond, and what model of
> switch exhibits the behavior you describe?

In our network, we have a cisco 9364 switch. Which will disable mac learning
for 120 seconds if 6 MAC moves in 30 seconds[1] by default.

> 
> 	That said, the intent of the current implementation is to cycle
> through the interfaces in the bond relatively quickly when no interfaces
> are up, under the theory that such behavior finds an available interface
> in the minimum time.
> 
> 	I'm not necessarily opposed to having each probe "step," so to
> speak, perform multiple ARP probe checks.  However, I wonder if this is
> a complicated workaround for not wanting to change a configuration
> setting on a switch, and it would only make things better by chance
> (i.e., that the probes just happen to now take long enough to not run
> afoul of the switch's time limit for some flap parameter).

For Cisco Nexus 9300-X switches, the `mac-move policy` is supported since
Cisco NX-OS Release 10.3(1)F, which is released August 19, 2022.

So there do have an option to disable/modify the mac policy. But switches
can't update to this version will be affected, unless the user change the
arp_interval to an large number.

As there is an workaround (either change the switch configure or
arp_interval), I don't have a strong intend to change the bonding behavior.
I will do it or ignore it based on your decision.

[1] https://www.cisco.com/c/en/us/td/docs/switches/datacenter/nexus9000/sw/104x/config-guides/cisco-nexus-9000-series-nx-os-system-management-configuration-guide-release-104x/m-configuring-mac-move.html

Thanks
Hangbin

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ