lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20241025092855.52212-1-chengyechun1@huawei.com>
Date: Fri, 25 Oct 2024 17:28:55 +0800
From: chengyechun <chengyechun1@...wei.com>
To: <netdev@...r.kernel.org>, <andy@...yhouse.net>
CC: <j.vosburgh@...il.com>
Subject: [Discuss]Questions about active slave select in bonding 8023ad

Hi all,
Recently,I'm having a problem starting bond. It's an occasional problem.
After the slave and bond are configured, the network fails to be restarted. The failure cause is as follows:
“/etc/sysconfig/network-scripts/ifup-eth[2747129]: Error, some other host () already uses address 1.1.1.39.”
When the network uses arping to check whether an IP address conflict occurs, an error occurs, but the IP address conflict is not caused. this is very strange.
The kernel version 5.10 is used. The bond configuration is as follows:

BONDING_OPTS='mode=4 miimon=100 lacp_rate=fast xmit_hash_policy=layer3+4'
TYPE=Bond
BONDING_MASTER=yes
BOOTPROTO=static
NM_CONTROLLED=no
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=bond0
DEVICE=bond0
ONBOOT=yes
IPADDR=1.1.1.38
NETMASK=255.255.0.0
IPV6ADDR=1:1:1::39/64

The slave configuration is as follows: and I have four similar slaves enp13s0,enp14s0,enp15s0

NAME=enp12s0
DEVICE=enp12s0
BOOTPROTO=none
ONBOOT=yes
USERCTL=no
NM_CONTROLLED=no
MASTER=bond0
SLAVE=yes
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no

After I discovered this problem, I restarted the network multiple times and it always happened once or twice.
After some debugging, it is found that the bond interface does not have an available slave when the arping packet is sent. As a result, the arping packet fails to be sent.
When the problem occurs, the active slave node is switched from enp12s0 to enp13s0. However, the backup of enp13s0 is not changed from 1 to 0 immediately after the switchover is complete. This is a mechanism or bug?

After thinking about it, I have a doubt about the select of active slave. In the ad_agg_selection_test function, if condition 3a is met, that is, if (__agg_has_partner(curr) && !__agg_has_partner(best)),and after the active slave switch is successful, why not enable_port the best slave in ad_agg_selection_logic?







Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ