[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1133230.1746881077@vermin>
Date: Sat, 10 May 2025 14:44:37 +0200
From: Jay Vosburgh <jv@...sburgh.net>
To: tonghao@...aicloud.com
cc: netdev@...r.kernel.org, "David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>,
Jonathan Corbet <corbet@....net>,
Andrew Lunn <andrew+netdev@...n.ch>
Subject: Re: [PATCH net-next 1/4] net: bonding: add broadcast_neighbor option for 802.3ad
tonghao@...aicloud.com wrote:
>From: Tonghao Zhang <tonghao@...aicloud.com>
>
>Stacking technology provides technical benefits but has inherent drawbacks.
>For instance, switch software or system upgrades require simultaneous reboots
>of all stacked switches. Additionally, stacking link failures may cause
>stack splitting.
>
>To improve network stability, non-stacking solutions have been increasingly
>adopted, particularly by public cloud providers and technology companies
>like Alibaba, Tencent, and Didi. The server still uses dual network cards and
>dual uplinks to two switches, and the network card mode is set to
>bond mode 4 (IEEE 802.3ad). As aggregation ports transmit ARP/ND data
>exclusively through one physical port, both switches in non-stacking
>deployments must receive server ARP/ND requests. This requires bonding driver
>modifications to broadcast ARP/ND packets through all active slave links.
>
>- https://www.ruijie.com/fr-fr/support/tech-gallery/de-stack-data-center-network-architecture/
I didn't really follow the explanation here without reading the
linked article. I think it would be better to explain the basics of
what this "non-stacking" architecture is, then describe the change to
bonding necessary to support it. This description need not go into
great detail; assuming I understand correctly, perhaps something along
the lines of:
"non-stacking" is a method of mimicing switch stacking that
convinces a LACP peer, bonding in this case, connected to a set of
"non-stacked" switches that all of its ports are connected to a single
switch (i.e., LACP aggregator), as if those switches were stacked. This
enables the LACP peer's ports to aggregate together, and requires (a)
special switch configuration, described in the linked article, and (b)
modifications to the bonding 802.3ad (LACP) mode to send all ARP / ND
packets across all ports of the active aggregator.
Is that a fair summary?
Regardless, the commit message should stand on its own, even if
the linked article is gone at some point in the future.
>Cc: Jay Vosburgh <jv@...sburgh.net>
>Cc: "David S. Miller" <davem@...emloft.net>
>Cc: Eric Dumazet <edumazet@...gle.com>
>Cc: Jakub Kicinski <kuba@...nel.org>
>Cc: Paolo Abeni <pabeni@...hat.com>
>Cc: Simon Horman <horms@...nel.org>
>Cc: Jonathan Corbet <corbet@....net>
>Cc: Andrew Lunn <andrew+netdev@...n.ch>
>Signed-off-by: Tonghao Zhang <tonghao@...aicloud.com>
>---
> Documentation/networking/bonding.rst | 5 +++
> drivers/net/bonding/bond_main.c | 58 +++++++++++++++++++++-------
> drivers/net/bonding/bond_options.c | 25 ++++++++++++
> drivers/net/bonding/bond_sysfs.c | 18 +++++++++
> include/net/bond_options.h | 1 +
> include/net/bonding.h | 1 +
> 6 files changed, 94 insertions(+), 14 deletions(-)
>
>diff --git a/Documentation/networking/bonding.rst b/Documentation/networking/bonding.rst
>index a4c1291d2561..0aca6e7599db 100644
>--- a/Documentation/networking/bonding.rst
>+++ b/Documentation/networking/bonding.rst
>@@ -562,6 +562,11 @@ lacp_rate
>
> The default is slow.
>
>+broadcast_neighbor
>+
>+ Option specifying whether to broadcast ARP/ND packets to all
>+ active slaves. The default is off (0).
>+
I'm happy to see documentation updates; however, please add the
caveat that this option has no effect in modes other than 802.3ad mode.
Also, the text is not formatted consistently with the options around it
(block indented one tab).
> max_bonds
>
> Specifies the number of bonding devices to create for this
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index d05226484c64..c54bfba10688 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -5316,23 +5316,31 @@ static struct slave *bond_xdp_xmit_3ad_xor_slave_get(struct bonding *bond,
> return slaves->arr[hash % count];
> }
>
>-/* Use this Xmit function for 3AD as well as XOR modes. The current
>- * usable slave array is formed in the control path. The xmit function
>- * just calculates hash and sends the packet out.
>- */
>-static netdev_tx_t bond_3ad_xor_xmit(struct sk_buff *skb,
>- struct net_device *dev)
>+static bool bond_should_broadcast_neighbor(struct bonding *bond,
>+ struct sk_buff *skb)
> {
>- struct bonding *bond = netdev_priv(dev);
>- struct bond_up_slave *slaves;
>- struct slave *slave;
>+ if (BOND_MODE(bond) != BOND_MODE_8023AD)
>+ return false;
>
>- slaves = rcu_dereference(bond->usable_slaves);
>- slave = bond_xmit_3ad_xor_slave_get(bond, skb, slaves);
>- if (likely(slave))
>- return bond_dev_queue_xmit(bond, skb, slave->dev);
>+ if (!bond->params.broadcast_neighbor)
>+ return false;
>
>- return bond_tx_drop(dev, skb);
>+ if (skb->protocol == htons(ETH_P_ARP))
>+ return true;
>+
>+ if (skb->protocol == htons(ETH_P_IPV6) &&
>+ pskb_may_pull(skb,
>+ sizeof(struct ipv6hdr) + sizeof(struct icmp6hdr))) {
>+ if (ipv6_hdr(skb)->nexthdr == IPPROTO_ICMPV6) {
>+ struct icmp6hdr *icmph = icmp6_hdr(skb);
>+
>+ if ((icmph->icmp6_type == NDISC_NEIGHBOUR_SOLICITATION) ||
>+ (icmph->icmp6_type == NDISC_NEIGHBOUR_ADVERTISEMENT))
>+ return true;
>+ }
>+ }
>+
>+ return false;
> }
>
> /* in broadcast mode, we send everything to all usable interfaces. */
>@@ -5377,6 +5385,28 @@ static netdev_tx_t bond_xmit_broadcast(struct sk_buff *skb,
> return NET_XMIT_DROP;
> }
>
>+/* Use this Xmit function for 3AD as well as XOR modes. The current
>+ * usable slave array is formed in the control path. The xmit function
>+ * just calculates hash and sends the packet out.
>+ */
>+static netdev_tx_t bond_3ad_xor_xmit(struct sk_buff *skb,
>+ struct net_device *dev)
>+{
>+ struct bonding *bond = netdev_priv(dev);
>+ struct bond_up_slave *slaves;
>+ struct slave *slave;
>+
>+ if (bond_should_broadcast_neighbor(bond, skb))
>+ return bond_xmit_broadcast(skb, dev);
I feel like there has to be a way to implement this that won't
add two or three branches to the transmit fast path for every packet
when this option is not enabled (which will be the vast majority of
cases). I'm not sure what that would look like, needs some thought.
Is the call to bond_should_broadcast_neighbor inlined at compile
time?
-J
>+
>+ slaves = rcu_dereference(bond->usable_slaves);
>+ slave = bond_xmit_3ad_xor_slave_get(bond, skb, slaves);
>+ if (likely(slave))
>+ return bond_dev_queue_xmit(bond, skb, slave->dev);
>+
>+ return bond_tx_drop(dev, skb);
>+}
>+
> /*------------------------- Device initialization ---------------------------*/
>
> /* Lookup the slave that corresponds to a qid */
>diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
>index 91893c29b899..38e8f03d1707 100644
>--- a/drivers/net/bonding/bond_options.c
>+++ b/drivers/net/bonding/bond_options.c
>@@ -87,6 +87,8 @@ static int bond_option_missed_max_set(struct bonding *bond,
> const struct bond_opt_value *newval);
> static int bond_option_coupled_control_set(struct bonding *bond,
> const struct bond_opt_value *newval);
>+static int bond_option_broadcast_neigh_set(struct bonding *bond,
>+ const struct bond_opt_value *newval);
>
> static const struct bond_opt_value bond_mode_tbl[] = {
> { "balance-rr", BOND_MODE_ROUNDROBIN, BOND_VALFLAG_DEFAULT},
>@@ -240,6 +242,12 @@ static const struct bond_opt_value bond_coupled_control_tbl[] = {
> { NULL, -1, 0},
> };
>
>+static const struct bond_opt_value bond_broadcast_neigh_tbl[] = {
>+ { "on", 1, 0},
>+ { "off", 0, BOND_VALFLAG_DEFAULT},
>+ { NULL, -1, 0}
>+};
>+
> static const struct bond_option bond_opts[BOND_OPT_LAST] = {
> [BOND_OPT_MODE] = {
> .id = BOND_OPT_MODE,
>@@ -513,6 +521,14 @@ static const struct bond_option bond_opts[BOND_OPT_LAST] = {
> .flags = BOND_OPTFLAG_IFDOWN,
> .values = bond_coupled_control_tbl,
> .set = bond_option_coupled_control_set,
>+ },
>+ [BOND_OPT_BROADCAST_NEIGH] = {
>+ .id = BOND_OPT_BROADCAST_NEIGH,
>+ .name = "broadcast_neighbor",
>+ .desc = "Broadcast neighbor packets to all slaves",
>+ .unsuppmodes = BOND_MODE_ALL_EX(BIT(BOND_MODE_8023AD)),
>+ .values = bond_broadcast_neigh_tbl,
>+ .set = bond_option_broadcast_neigh_set,
> }
> };
>
>@@ -1840,3 +1856,12 @@ static int bond_option_coupled_control_set(struct bonding *bond,
> bond->params.coupled_control = newval->value;
> return 0;
> }
>+
>+static int bond_option_broadcast_neigh_set(struct bonding *bond,
>+ const struct bond_opt_value *newval)
>+{
>+ netdev_dbg(bond->dev, "Setting broadcast_neighbor to %llu\n",
>+ newval->value);
>+ bond->params.broadcast_neighbor = newval->value;
>+ return 0;
>+}
>diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
>index 1e13bb170515..76f2a1bf57c2 100644
>--- a/drivers/net/bonding/bond_sysfs.c
>+++ b/drivers/net/bonding/bond_sysfs.c
>@@ -752,6 +752,23 @@ static ssize_t bonding_show_ad_user_port_key(struct device *d,
> static DEVICE_ATTR(ad_user_port_key, 0644,
> bonding_show_ad_user_port_key, bonding_sysfs_store_option);
>
>+static ssize_t bonding_show_broadcast_neighbor(struct device *d,
>+ struct device_attribute *attr,
>+ char *buf)
>+{
>+ struct bonding *bond = to_bond(d);
>+ const struct bond_opt_value *val;
>+
>+ val = bond_opt_get_val(BOND_OPT_BROADCAST_NEIGH,
>+ bond->params.broadcast_neighbor);
>+
>+ return sysfs_emit(buf, "%s %d\n", val->string,
>+ bond->params.broadcast_neighbor);
>+}
>+
>+static DEVICE_ATTR(broadcast_neighbor, 0644,
>+ bonding_show_broadcast_neighbor, bonding_sysfs_store_option);
>+
> static struct attribute *per_bond_attrs[] = {
> &dev_attr_slaves.attr,
> &dev_attr_mode.attr,
>@@ -791,6 +808,7 @@ static struct attribute *per_bond_attrs[] = {
> &dev_attr_ad_actor_system.attr,
> &dev_attr_ad_user_port_key.attr,
> &dev_attr_arp_missed_max.attr,
>+ &dev_attr_broadcast_neighbor.attr,
> NULL,
> };
>
>diff --git a/include/net/bond_options.h b/include/net/bond_options.h
>index 18687ccf0638..022b122a9fb6 100644
>--- a/include/net/bond_options.h
>+++ b/include/net/bond_options.h
>@@ -77,6 +77,7 @@ enum {
> BOND_OPT_NS_TARGETS,
> BOND_OPT_PRIO,
> BOND_OPT_COUPLED_CONTROL,
>+ BOND_OPT_BROADCAST_NEIGH,
> BOND_OPT_LAST
> };
>
>diff --git a/include/net/bonding.h b/include/net/bonding.h
>index 95f67b308c19..1eafd15eaad9 100644
>--- a/include/net/bonding.h
>+++ b/include/net/bonding.h
>@@ -149,6 +149,7 @@ struct bond_params {
> struct in6_addr ns_targets[BOND_MAX_NS_TARGETS];
> #endif
> int coupled_control;
>+ int broadcast_neighbor;
>
> /* 2 bytes of padding : see ether_addr_equal_64bits() */
> u8 ad_actor_system[ETH_ALEN + 2];
>--
>2.34.1
>
---
-Jay Vosburgh, jv@...sburgh.net
Powered by blists - more mailing lists