[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Zy18yA6kNmlCl6eQ@fedora>
Date: Fri, 8 Nov 2024 02:51:52 +0000
From: Hangbin Liu <liuhangbin@...il.com>
To: Jay Vosburgh <jv@...sburgh.net>
Cc: netdev@...r.kernel.org
Subject: Re: [Question]: should we consider arp missed max during
bond_ab_arp_probe()?
On Thu, Nov 07, 2024 at 05:32:29PM -0800, Jay Vosburgh wrote:
> Hangbin Liu <liuhangbin@...il.com> wrote:
>
> >Hi Jay,
> >
> >Our QE reported that, when there is no active slave during
> >bond_ab_arp_probe(), the slaves send the arp probe message one by one. This
> >will flap the switch's mac table quickly, sometimes even make the switch stop
> >learning mac address. So should we consider the arp missed max during
> >bond_ab_arp_probe()? i.e. each slave has more chances to send probe messages
> >before switch to another slave. What do you think?
>
> Well, "quickly" here depends entirely on what the value of
> arp_interval is. It's been quite a while since I looked into the
> details of this particular behavior, but at the time I didn't see the
> switches I had issue flap warnings. If memory serves, I usually tested
> with arp_interval in the realm of 100ms, with anywhere from 2 to 6
> interfaces in the bond.
>
> What settings are you using for the bond, and what model of
> switch exhibits the behavior you describe?
In our network, we have a cisco 9364 switch. Which will disable mac learning
for 120 seconds if 6 MAC moves in 30 seconds[1] by default.
>
> That said, the intent of the current implementation is to cycle
> through the interfaces in the bond relatively quickly when no interfaces
> are up, under the theory that such behavior finds an available interface
> in the minimum time.
>
> I'm not necessarily opposed to having each probe "step," so to
> speak, perform multiple ARP probe checks. However, I wonder if this is
> a complicated workaround for not wanting to change a configuration
> setting on a switch, and it would only make things better by chance
> (i.e., that the probes just happen to now take long enough to not run
> afoul of the switch's time limit for some flap parameter).
For Cisco Nexus 9300-X switches, the `mac-move policy` is supported since
Cisco NX-OS Release 10.3(1)F, which is released August 19, 2022.
So there do have an option to disable/modify the mac policy. But switches
can't update to this version will be affected, unless the user change the
arp_interval to an large number.
As there is an workaround (either change the switch configure or
arp_interval), I don't have a strong intend to change the bonding behavior.
I will do it or ignore it based on your decision.
[1] https://www.cisco.com/c/en/us/td/docs/switches/datacenter/nexus9000/sw/104x/config-guides/cisco-nexus-9000-series-nx-os-system-management-configuration-guide-release-104x/m-configuring-mac-move.html
Thanks
Hangbin
Powered by blists - more mailing lists