[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <4270C010E5516F3B+20250513094750.23387-2-tonghao@bamaicloud.com>
Date: Tue, 13 May 2025 17:47:47 +0800
From: Tonghao Zhang <tonghao@...aicloud.com>
To: netdev@...r.kernel.org
Cc: Tonghao Zhang <tonghao@...aicloud.com>,
Jay Vosburgh <jv@...sburgh.net>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>,
Simon Horman <horms@...nel.org>,
Jonathan Corbet <corbet@....net>,
Andrew Lunn <andrew+netdev@...n.ch>,
Zengbing Tu <tuzengbing@...iglobal.com>
Subject: [PATCH net-next v2 1/4] net: bonding: add broadcast_neighbor option for 802.3ad
Stacking technology is a type of technology used to expand ports on
Ethernet switches. It is widely used as a common access method in
large-scale Internet data center architectures. Years of practice
have proved that stacking technology has advantages and disadvantages
in high-reliability network architecture scenarios. For instance,
in stacking networking arch, conventional switch system upgrades
require multiple stacked devices to restart at the same time.
Therefore, it is inevitable that the business will be interrupted
for a while. It is for this reason that "no-stacking" in data centers
has become a trend. Additionally, when the stacking link connecting
the switches fails or is abnormal, the stack will split. Although it is
not common, it still happens in actual operation. The problem is that
after the split, it is equivalent to two switches with the same configuration
appearing in the network, causing network configuration conflicts and
ultimately interrupting the services carried by the stacking system.
To improve network stability, "non-stacking" solutions have been increasingly
adopted, particularly by public cloud providers and tech companies
like Alibaba, Tencent, and Didi. "non-stacking" is a method of mimicing switch
stacking that convinces a LACP peer, bonding in this case, connected to a set of
"non-stacked" switches that all of its ports are connected to a single
switch (i.e., LACP aggregator), as if those switches were stacked. This
enables the LACP peer's ports to aggregate together, and requires (a)
special switch configuration, described in the linked article, and (b)
modifications to the bonding 802.3ad (LACP) mode to send all ARP / ND
packets across all ports of the active aggregator.
----------- -----------
| switch1 | | switch2 |
----------- -----------
^ ^
| |
---------------
| bond4 lacp |
---------------
| NIC1 | NIC2
---------------------
| server |
---------------------
- https://www.ruijie.com/fr-fr/support/tech-gallery/de-stack-data-center-network-architecture/
Cc: Jay Vosburgh <jv@...sburgh.net>
Cc: "David S. Miller" <davem@...emloft.net>
Cc: Eric Dumazet <edumazet@...gle.com>
Cc: Jakub Kicinski <kuba@...nel.org>
Cc: Paolo Abeni <pabeni@...hat.com>
Cc: Simon Horman <horms@...nel.org>
Cc: Jonathan Corbet <corbet@....net>
Cc: Andrew Lunn <andrew+netdev@...n.ch>
Signed-off-by: Tonghao Zhang <tonghao@...aicloud.com>
Signed-off-by: Zengbing Tu <tuzengbing@...iglobal.com>
---
Documentation/networking/bonding.rst | 6 +++++
drivers/net/bonding/bond_main.c | 39 ++++++++++++++++++++++++++++
drivers/net/bonding/bond_options.c | 34 ++++++++++++++++++++++++
drivers/net/bonding/bond_sysfs.c | 18 +++++++++++++
include/net/bond_options.h | 1 +
include/net/bonding.h | 3 +++
6 files changed, 101 insertions(+)
diff --git a/Documentation/networking/bonding.rst b/Documentation/networking/bonding.rst
index a4c1291d2561..14f7593d888d 100644
--- a/Documentation/networking/bonding.rst
+++ b/Documentation/networking/bonding.rst
@@ -562,6 +562,12 @@ lacp_rate
The default is slow.
+broadcast_neighbor
+
+ Option specifying whether to broadcast ARP/ND packets to all
+ active slaves. This option has no effect in modes other than
+ 802.3ad mode. The default is off (0).
+
max_bonds
Specifies the number of bonding devices to create for this
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index d05226484c64..8ee26ddddbc8 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -212,6 +212,9 @@ atomic_t netpoll_block_tx = ATOMIC_INIT(0);
unsigned int bond_net_id __read_mostly;
+DEFINE_STATIC_KEY_FALSE(bond_bcast_neigh_enabled);
+EXPORT_SYMBOL_GPL(bond_bcast_neigh_enabled);
+
static const struct flow_dissector_key flow_keys_bonding_keys[] = {
{
.key_id = FLOW_DISSECTOR_KEY_CONTROL,
@@ -4480,6 +4483,9 @@ static int bond_close(struct net_device *bond_dev)
bond_alb_deinitialize(bond);
bond->recv_probe = NULL;
+ if (bond->params.broadcast_neighbor)
+ static_branch_dec(&bond_bcast_neigh_enabled);
+
if (bond_uses_primary(bond)) {
rcu_read_lock();
slave = rcu_dereference(bond->curr_active_slave);
@@ -5316,6 +5322,35 @@ static struct slave *bond_xdp_xmit_3ad_xor_slave_get(struct bonding *bond,
return slaves->arr[hash % count];
}
+static inline bool bond_should_broadcast_neighbor(struct sk_buff *skb,
+ struct net_device *dev)
+{
+ struct bonding *bond = netdev_priv(dev);
+
+ if (!static_branch_unlikely(&bond_bcast_neigh_enabled))
+ return false;
+
+ if (!bond->params.broadcast_neighbor)
+ return false;
+
+ if (skb->protocol == htons(ETH_P_ARP))
+ return true;
+
+ if (skb->protocol == htons(ETH_P_IPV6) &&
+ pskb_may_pull(skb,
+ sizeof(struct ipv6hdr) + sizeof(struct icmp6hdr))) {
+ if (ipv6_hdr(skb)->nexthdr == IPPROTO_ICMPV6) {
+ struct icmp6hdr *icmph = icmp6_hdr(skb);
+
+ if ((icmph->icmp6_type == NDISC_NEIGHBOUR_SOLICITATION) ||
+ (icmph->icmp6_type == NDISC_NEIGHBOUR_ADVERTISEMENT))
+ return true;
+ }
+ }
+
+ return false;
+}
+
/* Use this Xmit function for 3AD as well as XOR modes. The current
* usable slave array is formed in the control path. The xmit function
* just calculates hash and sends the packet out.
@@ -5583,6 +5618,9 @@ static netdev_tx_t __bond_start_xmit(struct sk_buff *skb, struct net_device *dev
case BOND_MODE_ACTIVEBACKUP:
return bond_xmit_activebackup(skb, dev);
case BOND_MODE_8023AD:
+ if (bond_should_broadcast_neighbor(skb, dev))
+ return bond_xmit_broadcast(skb, dev);
+ fallthrough;
case BOND_MODE_XOR:
return bond_3ad_xor_xmit(skb, dev);
case BOND_MODE_BROADCAST:
@@ -6462,6 +6500,7 @@ static int __init bond_check_params(struct bond_params *params)
eth_zero_addr(params->ad_actor_system);
params->ad_user_port_key = ad_user_port_key;
params->coupled_control = 1;
+ params->broadcast_neighbor = 0;
if (packets_per_slave > 0) {
params->reciprocal_packets_per_slave =
reciprocal_value(packets_per_slave);
diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
index 91893c29b899..dca52d93f513 100644
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -87,6 +87,8 @@ static int bond_option_missed_max_set(struct bonding *bond,
const struct bond_opt_value *newval);
static int bond_option_coupled_control_set(struct bonding *bond,
const struct bond_opt_value *newval);
+static int bond_option_broadcast_neigh_set(struct bonding *bond,
+ const struct bond_opt_value *newval);
static const struct bond_opt_value bond_mode_tbl[] = {
{ "balance-rr", BOND_MODE_ROUNDROBIN, BOND_VALFLAG_DEFAULT},
@@ -240,6 +242,12 @@ static const struct bond_opt_value bond_coupled_control_tbl[] = {
{ NULL, -1, 0},
};
+static const struct bond_opt_value bond_broadcast_neigh_tbl[] = {
+ { "on", 1, 0},
+ { "off", 0, BOND_VALFLAG_DEFAULT},
+ { NULL, -1, 0}
+};
+
static const struct bond_option bond_opts[BOND_OPT_LAST] = {
[BOND_OPT_MODE] = {
.id = BOND_OPT_MODE,
@@ -513,6 +521,14 @@ static const struct bond_option bond_opts[BOND_OPT_LAST] = {
.flags = BOND_OPTFLAG_IFDOWN,
.values = bond_coupled_control_tbl,
.set = bond_option_coupled_control_set,
+ },
+ [BOND_OPT_BROADCAST_NEIGH] = {
+ .id = BOND_OPT_BROADCAST_NEIGH,
+ .name = "broadcast_neighbor",
+ .desc = "Broadcast neighbor packets to all slaves",
+ .unsuppmodes = BOND_MODE_ALL_EX(BIT(BOND_MODE_8023AD)),
+ .values = bond_broadcast_neigh_tbl,
+ .set = bond_option_broadcast_neigh_set,
}
};
@@ -1840,3 +1856,21 @@ static int bond_option_coupled_control_set(struct bonding *bond,
bond->params.coupled_control = newval->value;
return 0;
}
+
+static int bond_option_broadcast_neigh_set(struct bonding *bond,
+ const struct bond_opt_value *newval)
+{
+ netdev_dbg(bond->dev, "Setting broadcast_neighbor to %llu\n",
+ newval->value);
+
+ if (bond->params.broadcast_neighbor == newval->value)
+ return 0;
+
+ bond->params.broadcast_neighbor = newval->value;
+ if (bond->params.broadcast_neighbor)
+ static_branch_inc(&bond_bcast_neigh_enabled);
+ else
+ static_branch_dec(&bond_bcast_neigh_enabled);
+
+ return 0;
+}
diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index 1e13bb170515..4a53850b2c68 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -752,6 +752,23 @@ static ssize_t bonding_show_ad_user_port_key(struct device *d,
static DEVICE_ATTR(ad_user_port_key, 0644,
bonding_show_ad_user_port_key, bonding_sysfs_store_option);
+static ssize_t bonding_show_broadcast_neighbor(struct device *d,
+ struct device_attribute *attr,
+ char *buf)
+{
+ struct bonding *bond = to_bond(d);
+ const struct bond_opt_value *val;
+
+ val = bond_opt_get_val(BOND_OPT_BROADCAST_NEIGH,
+ bond->params.broadcast_neighbor);
+
+ return sysfs_emit(buf, "%s %d\n", val->string,
+ bond->params.broadcast_neighbor);
+}
+
+static DEVICE_ATTR(broadcast_neighbor, 0644,
+ bonding_show_broadcast_neighbor, bonding_sysfs_store_option);
+
static struct attribute *per_bond_attrs[] = {
&dev_attr_slaves.attr,
&dev_attr_mode.attr,
@@ -791,6 +808,7 @@ static struct attribute *per_bond_attrs[] = {
&dev_attr_ad_actor_system.attr,
&dev_attr_ad_user_port_key.attr,
&dev_attr_arp_missed_max.attr,
+ &dev_attr_broadcast_neighbor.attr,
NULL,
};
diff --git a/include/net/bond_options.h b/include/net/bond_options.h
index 18687ccf0638..022b122a9fb6 100644
--- a/include/net/bond_options.h
+++ b/include/net/bond_options.h
@@ -77,6 +77,7 @@ enum {
BOND_OPT_NS_TARGETS,
BOND_OPT_PRIO,
BOND_OPT_COUPLED_CONTROL,
+ BOND_OPT_BROADCAST_NEIGH,
BOND_OPT_LAST
};
diff --git a/include/net/bonding.h b/include/net/bonding.h
index 95f67b308c19..e06f0d63b2c1 100644
--- a/include/net/bonding.h
+++ b/include/net/bonding.h
@@ -115,6 +115,8 @@ static inline int is_netpoll_tx_blocked(struct net_device *dev)
#define is_netpoll_tx_blocked(dev) (0)
#endif
+DECLARE_STATIC_KEY_FALSE(bond_bcast_neigh_enabled);
+
struct bond_params {
int mode;
int xmit_policy;
@@ -149,6 +151,7 @@ struct bond_params {
struct in6_addr ns_targets[BOND_MAX_NS_TARGETS];
#endif
int coupled_control;
+ int broadcast_neighbor;
/* 2 bytes of padding : see ether_addr_equal_64bits() */
u8 ad_actor_system[ETH_ALEN + 2];
--
2.34.1
Powered by blists - more mailing lists