[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20241025092855.52212-1-chengyechun1@huawei.com>
Date: Fri, 25 Oct 2024 17:28:55 +0800
From: chengyechun <chengyechun1@...wei.com>
To: <netdev@...r.kernel.org>, <andy@...yhouse.net>
CC: <j.vosburgh@...il.com>
Subject: [Discuss]Questions about active slave select in bonding 8023ad
Hi all,
Recently,I'm having a problem starting bond. It's an occasional problem.
After the slave and bond are configured, the network fails to be restarted. The failure cause is as follows:
“/etc/sysconfig/network-scripts/ifup-eth[2747129]: Error, some other host () already uses address 1.1.1.39.”
When the network uses arping to check whether an IP address conflict occurs, an error occurs, but the IP address conflict is not caused. this is very strange.
The kernel version 5.10 is used. The bond configuration is as follows:
BONDING_OPTS='mode=4 miimon=100 lacp_rate=fast xmit_hash_policy=layer3+4'
TYPE=Bond
BONDING_MASTER=yes
BOOTPROTO=static
NM_CONTROLLED=no
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=bond0
DEVICE=bond0
ONBOOT=yes
IPADDR=1.1.1.38
NETMASK=255.255.0.0
IPV6ADDR=1:1:1::39/64
The slave configuration is as follows: and I have four similar slaves enp13s0,enp14s0,enp15s0
NAME=enp12s0
DEVICE=enp12s0
BOOTPROTO=none
ONBOOT=yes
USERCTL=no
NM_CONTROLLED=no
MASTER=bond0
SLAVE=yes
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
After I discovered this problem, I restarted the network multiple times and it always happened once or twice.
After some debugging, it is found that the bond interface does not have an available slave when the arping packet is sent. As a result, the arping packet fails to be sent.
When the problem occurs, the active slave node is switched from enp12s0 to enp13s0. However, the backup of enp13s0 is not changed from 1 to 0 immediately after the switchover is complete. This is a mechanism or bug?
After thinking about it, I have a doubt about the select of active slave. In the ad_agg_selection_test function, if condition 3a is met, that is, if (__agg_has_partner(curr) && !__agg_has_partner(best)),and after the active slave switch is successful, why not enable_port the best slave in ad_agg_selection_logic?
Powered by blists - more mailing lists