lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-Id: <CE4DB782-91EB-4DBD-9C26-CA4C4612D58C@bamaicloud.com> Date: Sun, 11 May 2025 22:07:46 +0800 From: Tonghao Zhang <tonghao@...aicloud.com> To: Jay Vosburgh <jv@...sburgh.net> Cc: netdev@...r.kernel.org, "David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>, Jonathan Corbet <corbet@....net>, Andrew Lunn <andrew+netdev@...n.ch> Subject: Re: [PATCH net-next 1/4] net: bonding: add broadcast_neighbor option for 802.3ad > 2025年5月10日 下午8:44,Jay Vosburgh <jv@...sburgh.net> 写道: > > tonghao@...aicloud.com wrote: > >> From: Tonghao Zhang <tonghao@...aicloud.com> >> >> Stacking technology provides technical benefits but has inherent drawbacks. >> For instance, switch software or system upgrades require simultaneous reboots >> of all stacked switches. Additionally, stacking link failures may cause >> stack splitting. >> >> To improve network stability, non-stacking solutions have been increasingly >> adopted, particularly by public cloud providers and technology companies >> like Alibaba, Tencent, and Didi. The server still uses dual network cards and >> dual uplinks to two switches, and the network card mode is set to >> bond mode 4 (IEEE 802.3ad). As aggregation ports transmit ARP/ND data >> exclusively through one physical port, both switches in non-stacking >> deployments must receive server ARP/ND requests. This requires bonding driver >> modifications to broadcast ARP/ND packets through all active slave links. >> >> - https://www.ruijie.com/fr-fr/support/tech-gallery/de-stack-data-center-network-architecture/ > > I didn't really follow the explanation here without reading the > linked article. I think it would be better to explain the basics of > what this "non-stacking" architecture is, then describe the change to > bonding necessary to support it. This description need not go into > great detail; assuming I understand correctly, perhaps something along > the lines of: > > "non-stacking" is a method of mimicing switch stacking that > convinces a LACP peer, bonding in this case, connected to a set of > "non-stacked" switches that all of its ports are connected to a single > switch (i.e., LACP aggregator), as if those switches were stacked. This > enables the LACP peer's ports to aggregate together, and requires (a) > special switch configuration, described in the linked article, and (b) > modifications to the bonding 802.3ad (LACP) mode to send all ARP / ND > packets across all ports of the active aggregator. > > Is that a fair summary? Yes, pretty good. I will add more info of “no-stacking” arch and your summary. > Regardless, the commit message should stand on its own, even if > the linked article is gone at some point in the future. Yes > >> Cc: Jay Vosburgh <jv@...sburgh.net> >> Cc: "David S. Miller" <davem@...emloft.net> >> Cc: Eric Dumazet <edumazet@...gle.com> >> Cc: Jakub Kicinski <kuba@...nel.org> >> Cc: Paolo Abeni <pabeni@...hat.com> >> Cc: Simon Horman <horms@...nel.org> >> Cc: Jonathan Corbet <corbet@....net> >> Cc: Andrew Lunn <andrew+netdev@...n.ch> >> Signed-off-by: Tonghao Zhang <tonghao@...aicloud.com> >> --- >> Documentation/networking/bonding.rst | 5 +++ >> drivers/net/bonding/bond_main.c | 58 +++++++++++++++++++++------- >> drivers/net/bonding/bond_options.c | 25 ++++++++++++ >> drivers/net/bonding/bond_sysfs.c | 18 +++++++++ >> include/net/bond_options.h | 1 + >> include/net/bonding.h | 1 + >> 6 files changed, 94 insertions(+), 14 deletions(-) >> >> diff --git a/Documentation/networking/bonding.rst b/Documentation/networking/bonding.rst >> index a4c1291d2561..0aca6e7599db 100644 >> --- a/Documentation/networking/bonding.rst >> +++ b/Documentation/networking/bonding.rst >> @@ -562,6 +562,11 @@ lacp_rate >> >> The default is slow. >> >> +broadcast_neighbor >> + >> + Option specifying whether to broadcast ARP/ND packets to all >> + active slaves. The default is off (0). >> + > > I'm happy to see documentation updates; however, please add the > caveat that this option has no effect in modes other than 802.3ad mode. > Also, the text is not formatted consistently with the options around it > (block indented one tab). Ok, update it in next patch version. >> max_bonds >> >> Specifies the number of bonding devices to create for this >> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c >> index d05226484c64..c54bfba10688 100644 >> --- a/drivers/net/bonding/bond_main.c >> +++ b/drivers/net/bonding/bond_main.c >> @@ -5316,23 +5316,31 @@ static struct slave *bond_xdp_xmit_3ad_xor_slave_get(struct bonding *bond, >> return slaves->arr[hash % count]; >> } >> >> -/* Use this Xmit function for 3AD as well as XOR modes. The current >> - * usable slave array is formed in the control path. The xmit function >> - * just calculates hash and sends the packet out. >> - */ >> -static netdev_tx_t bond_3ad_xor_xmit(struct sk_buff *skb, >> - struct net_device *dev) >> +static bool bond_should_broadcast_neighbor(struct bonding *bond, >> + struct sk_buff *skb) >> { >> - struct bonding *bond = netdev_priv(dev); >> - struct bond_up_slave *slaves; >> - struct slave *slave; >> + if (BOND_MODE(bond) != BOND_MODE_8023AD) >> + return false; >> >> - slaves = rcu_dereference(bond->usable_slaves); >> - slave = bond_xmit_3ad_xor_slave_get(bond, skb, slaves); >> - if (likely(slave)) >> - return bond_dev_queue_xmit(bond, skb, slave->dev); >> + if (!bond->params.broadcast_neighbor) >> + return false; >> >> - return bond_tx_drop(dev, skb); >> + if (skb->protocol == htons(ETH_P_ARP)) >> + return true; >> + >> + if (skb->protocol == htons(ETH_P_IPV6) && >> + pskb_may_pull(skb, >> + sizeof(struct ipv6hdr) + sizeof(struct icmp6hdr))) { >> + if (ipv6_hdr(skb)->nexthdr == IPPROTO_ICMPV6) { >> + struct icmp6hdr *icmph = icmp6_hdr(skb); >> + >> + if ((icmph->icmp6_type == NDISC_NEIGHBOUR_SOLICITATION) || >> + (icmph->icmp6_type == NDISC_NEIGHBOUR_ADVERTISEMENT)) >> + return true; >> + } >> + } >> + >> + return false; >> } >> >> /* in broadcast mode, we send everything to all usable interfaces. */ >> @@ -5377,6 +5385,28 @@ static netdev_tx_t bond_xmit_broadcast(struct sk_buff *skb, >> return NET_XMIT_DROP; >> } >> >> +/* Use this Xmit function for 3AD as well as XOR modes. The current >> + * usable slave array is formed in the control path. The xmit function >> + * just calculates hash and sends the packet out. >> + */ >> +static netdev_tx_t bond_3ad_xor_xmit(struct sk_buff *skb, >> + struct net_device *dev) >> +{ >> + struct bonding *bond = netdev_priv(dev); >> + struct bond_up_slave *slaves; >> + struct slave *slave; >> + >> + if (bond_should_broadcast_neighbor(bond, skb)) >> + return bond_xmit_broadcast(skb, dev); > > I feel like there has to be a way to implement this that won't > add two or three branches to the transmit fast path for every packet > when this option is not enabled (which will be the vast majority of > cases). I'm not sure what that would look like, needs some thought. The patch should check the .broadcast_neighbor and bond mode firstly, and ipv4 packets quickly, and then ipv6. Parsing network packets is unavoidable, especially for IPv6 packets. static inline bool bond_should_broadcast_neighbor(struct bonding *bond, struct sk_buff *skb) { if (!bond->params.broadcast_neighbor || BOND_MODE(bond) != BOND_MODE_8023AD) return false; if (skb->protocol == htons(ETH_P_ARP)) return true; if (skb->protocol == htons(ETH_P_IPV6) && pskb_may_pull(skb, sizeof(struct ipv6hdr) + sizeof(struct icmp6hdr))) { if (ipv6_hdr(skb)->nexthdr == IPPROTO_ICMPV6) { struct icmp6hdr *icmph = icmp6_hdr(skb); if ((icmph->icmp6_type == NDISC_NEIGHBOUR_SOLICITATION) || (icmph->icmp6_type == NDISC_NEIGHBOUR_ADVERTISEMENT)) return true; } } return false; } > >> max_bonds >> >> Specifies the number of bonding devices to create for this >> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c >> index d05226484c64..c54bfba10688 100644 >> --- a/drivers/net/bonding/bond_main.c >> +++ b/drivers/net/bonding/bond_main.c >> @@ -5316,23 +5316,31 @@ static struct slave *bond_xdp_xmit_3ad_xor_slave_get(struct bonding *bond, >> return slaves->arr[hash % count]; >> } >> >> -/* Use this Xmit function for 3AD as well as XOR modes. The current >> - * usable slave array is formed in the control path. The xmit function >> - * just calculates hash and sends the packet out. >> - */ >> -static netdev_tx_t bond_3ad_xor_xmit(struct sk_buff *skb, >> - struct net_device *dev) >> +static bool bond_should_broadcast_neighbor(struct bonding *bond, >> + struct sk_buff *skb) >> { >> - struct bonding *bond = netdev_priv(dev); >> - struct bond_up_slave *slaves; >> - struct slave *slave; >> + if (BOND_MODE(bond) != BOND_MODE_8023AD) >> + return false; >> >> - slaves = rcu_dereference(bond->usable_slaves); >> - slave = bond_xmit_3ad_xor_slave_get(bond, skb, slaves); >> - if (likely(slave)) >> - return bond_dev_queue_xmit(bond, skb, slave->dev); >> + if (!bond->params.broadcast_neighbor) >> + return false; >> >> - return bond_tx_drop(dev, skb); >> + if (skb->protocol == htons(ETH_P_ARP)) >> + return true; >> + >> + if (skb->protocol == htons(ETH_P_IPV6) && >> + pskb_may_pull(skb, >> + sizeof(struct ipv6hdr) + sizeof(struct icmp6hdr))) { >> + if (ipv6_hdr(skb)->nexthdr == IPPROTO_ICMPV6) { >> + struct icmp6hdr *icmph = icmp6_hdr(skb); >> + >> + if ((icmph->icmp6_type == NDISC_NEIGHBOUR_SOLICITATION) || >> + (icmph->icmp6_type == NDISC_NEIGHBOUR_ADVERTISEMENT)) >> + return true; >> + } >> + } >> + >> + return false; >> } >> >> /* in broadcast mode, we send everything to all usable interfaces. */ >> @@ -5377,6 +5385,28 @@ static netdev_tx_t bond_xmit_broadcast(struct sk_buff *skb, >> return NET_XMIT_DROP; >> } >> >> +/* Use this Xmit function for 3AD as well as XOR modes. The current >> + * usable slave array is formed in the control path. The xmit function >> + * just calculates hash and sends the packet out. >> + */ >> +static netdev_tx_t bond_3ad_xor_xmit(struct sk_buff *skb, >> + struct net_device *dev) >> +{ >> + struct bonding *bond = netdev_priv(dev); >> + struct bond_up_slave *slaves; >> + struct slave *slave; >> + >> + if (bond_should_broadcast_neighbor(bond, skb)) >> + return bond_xmit_broadcast(skb, dev); > > I feel like there has to be a way to implement this that won't > add two or three branches to the transmit fast path for every packet > when this option is not enabled (which will be the vast majority of > cases). I'm not sure what that would look like, needs some thought. > > Is the call to bond_should_broadcast_neighbor inlined at compile > time? > > -J > >> + >> + slaves = rcu_dereference(bond->usable_slaves); >> + slave = bond_xmit_3ad_xor_slave_get(bond, skb, slaves); >> + if (likely(slave)) >> + return bond_dev_queue_xmit(bond, skb, slave->dev); >> + >> + return bond_tx_drop(dev, skb); >> +} >> + >> /*------------------------- Device initialization ---------------------------*/ >> >> /* Lookup the slave that corresponds to a qid */ >> diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c >> index 91893c29b899..38e8f03d1707 100644 >> --- a/drivers/net/bonding/bond_options.c >> +++ b/drivers/net/bonding/bond_options.c >> @@ -87,6 +87,8 @@ static int bond_option_missed_max_set(struct bonding *bond, >> const struct bond_opt_value *newval); >> static int bond_option_coupled_control_set(struct bonding *bond, >> const struct bond_opt_value *newval); >> +static int bond_option_broadcast_neigh_set(struct bonding *bond, >> + const struct bond_opt_value *newval); >> >> static const struct bond_opt_value bond_mode_tbl[] = { >> { "balance-rr", BOND_MODE_ROUNDROBIN, BOND_VALFLAG_DEFAULT}, >> @@ -240,6 +242,12 @@ static const struct bond_opt_value bond_coupled_control_tbl[] = { >> { NULL, -1, 0}, >> }; >> >> +static const struct bond_opt_value bond_broadcast_neigh_tbl[] = { >> + { "on", 1, 0}, >> + { "off", 0, BOND_VALFLAG_DEFAULT}, >> + { NULL, -1, 0} >> +}; >> + >> static const struct bond_option bond_opts[BOND_OPT_LAST] = { >> [BOND_OPT_MODE] = { >> .id = BOND_OPT_MODE, >> @@ -513,6 +521,14 @@ static const struct bond_option bond_opts[BOND_OPT_LAST] = { >> .flags = BOND_OPTFLAG_IFDOWN, >> .values = bond_coupled_control_tbl, >> .set = bond_option_coupled_control_set, >> + }, >> + [BOND_OPT_BROADCAST_NEIGH] = { >> + .id = BOND_OPT_BROADCAST_NEIGH, >> + .name = "broadcast_neighbor", >> + .desc = "Broadcast neighbor packets to all slaves", >> + .unsuppmodes = BOND_MODE_ALL_EX(BIT(BOND_MODE_8023AD)), >> + .values = bond_broadcast_neigh_tbl, >> + .set = bond_option_broadcast_neigh_set, >> } >> }; >> >> @@ -1840,3 +1856,12 @@ static int bond_option_coupled_control_set(struct bonding *bond, >> bond->params.coupled_control = newval->value; >> return 0; >> } >> + >> +static int bond_option_broadcast_neigh_set(struct bonding *bond, >> + const struct bond_opt_value *newval) >> +{ >> + netdev_dbg(bond->dev, "Setting broadcast_neighbor to %llu\n", >> + newval->value); >> + bond->params.broadcast_neighbor = newval->value; >> + return 0; >> +} >> diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c >> index 1e13bb170515..76f2a1bf57c2 100644 >> --- a/drivers/net/bonding/bond_sysfs.c >> +++ b/drivers/net/bonding/bond_sysfs.c >> @@ -752,6 +752,23 @@ static ssize_t bonding_show_ad_user_port_key(struct device *d, >> static DEVICE_ATTR(ad_user_port_key, 0644, >> bonding_show_ad_user_port_key, bonding_sysfs_store_option); >> >> +static ssize_t bonding_show_broadcast_neighbor(struct device *d, >> + struct device_attribute *attr, >> + char *buf) >> +{ >> + struct bonding *bond = to_bond(d); >> + const struct bond_opt_value *val; >> + >> + val = bond_opt_get_val(BOND_OPT_BROADCAST_NEIGH, >> + bond->params.broadcast_neighbor); >> + >> + return sysfs_emit(buf, "%s %d\n", val->string, >> + bond->params.broadcast_neighbor); >> +} >> + >> +static DEVICE_ATTR(broadcast_neighbor, 0644, >> + bonding_show_broadcast_neighbor, bonding_sysfs_store_option); >> + >> static struct attribute *per_bond_attrs[] = { >> &dev_attr_slaves.attr, >> &dev_attr_mode.attr, >> @@ -791,6 +808,7 @@ static struct attribute *per_bond_attrs[] = { >> &dev_attr_ad_actor_system.attr, >> &dev_attr_ad_user_port_key.attr, >> &dev_attr_arp_missed_max.attr, >> + &dev_attr_broadcast_neighbor.attr, >> NULL, >> }; >> >> diff --git a/include/net/bond_options.h b/include/net/bond_options.h >> index 18687ccf0638..022b122a9fb6 100644 >> --- a/include/net/bond_options.h >> +++ b/include/net/bond_options.h >> @@ -77,6 +77,7 @@ enum { >> BOND_OPT_NS_TARGETS, >> BOND_OPT_PRIO, >> BOND_OPT_COUPLED_CONTROL, >> + BOND_OPT_BROADCAST_NEIGH, >> BOND_OPT_LAST >> }; >> >> diff --git a/include/net/bonding.h b/include/net/bonding.h >> index 95f67b308c19..1eafd15eaad9 100644 >> --- a/include/net/bonding.h >> +++ b/include/net/bonding.h >> @@ -149,6 +149,7 @@ struct bond_params { >> struct in6_addr ns_targets[BOND_MAX_NS_TARGETS]; >> #endif >> int coupled_control; >> + int broadcast_neighbor; >> >> /* 2 bytes of padding : see ether_addr_equal_64bits() */ >> u8 ad_actor_system[ETH_ALEN + 2]; >> -- >> 2.34.1 >> > > --- > -Jay Vosburgh, jv@...sburgh.net
Powered by blists - more mailing lists