[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e999e9a0-19d3-4519-8094-268a67f5da63@redhat.com>
Date: Tue, 27 May 2025 16:00:44 +0200
From: Paolo Abeni <pabeni@...hat.com>
To: Tonghao Zhang <tonghao@...aicloud.com>, netdev@...r.kernel.org
Cc: Jay Vosburgh <jv@...sburgh.net>, "David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
Simon Horman <horms@...nel.org>, Jonathan Corbet <corbet@....net>,
Andrew Lunn <andrew+netdev@...n.ch>, Steven Rostedt <rostedt@...dmis.org>,
Masami Hiramatsu <mhiramat@...nel.org>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Nikolay Aleksandrov <razor@...ckwall.org>,
Zengbing Tu <tuzengbing@...iglobal.com>
Subject: Re: [PATCH RESEND net-next v5 1/4] net: bonding: add
broadcast_neighbor option for 802.3ad
On 5/22/25 10:55 AM, Tonghao Zhang wrote:
> Stacking technology is a type of technology used to expand ports on
> Ethernet switches. It is widely used as a common access method in
> large-scale Internet data center architectures. Years of practice
> have proved that stacking technology has advantages and disadvantages
> in high-reliability network architecture scenarios. For instance,
> in stacking networking arch, conventional switch system upgrades
> require multiple stacked devices to restart at the same time.
> Therefore, it is inevitable that the business will be interrupted
> for a while. It is for this reason that "no-stacking" in data centers
> has become a trend. Additionally, when the stacking link connecting
> the switches fails or is abnormal, the stack will split. Although it is
> not common, it still happens in actual operation. The problem is that
> after the split, it is equivalent to two switches with the same
> configuration appearing in the network, causing network configuration
> conflicts and ultimately interrupting the services carried by the
> stacking system.
>
> To improve network stability, "non-stacking" solutions have been
> increasingly adopted, particularly by public cloud providers and
> tech companies like Alibaba, Tencent, and Didi. "non-stacking" is
> a method of mimicing switch stacking that convinces a LACP peer,
> bonding in this case, connected to a set of "non-stacked" switches
> that all of its ports are connected to a single switch
> (i.e., LACP aggregator), as if those switches were stacked. This
> enables the LACP peer's ports to aggregate together, and requires
> (a) special switch configuration, described in the linked article,
> and (b) modifications to the bonding 802.3ad (LACP) mode to send
> all ARP/ND packets across all ports of the active aggregator.
>
> Note that, with multiple aggregators, the current broadcast mode
> logic will send only packets to the selected aggregator(s).
>
> +-----------+ +-----------+
> | switch1 | | switch2 |
> +-----------+ +-----------+
> ^ ^
> | |
> +-----------------+
> | bond4 lacp |
> +-----------------+
> | |
> | NIC1 | NIC2
> +-----------------+
> | server |
> +-----------------+
IHMO all the above wording should be placed in the cover letter. This
commit message should instead be more related to the actual code.
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index d05226484c64..b5c34d7f126c 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -212,6 +212,8 @@ atomic_t netpoll_block_tx = ATOMIC_INIT(0);
>
> unsigned int bond_net_id __read_mostly;
>
> +DEFINE_STATIC_KEY_FALSE(bond_bcast_neigh_enabled);
> +
> static const struct flow_dissector_key flow_keys_bonding_keys[] = {
> {
> .key_id = FLOW_DISSECTOR_KEY_CONTROL,
> @@ -4461,6 +4463,9 @@ static int bond_open(struct net_device *bond_dev)
>
> bond_for_each_slave(bond, slave, iter)
> dev_mc_add(slave->dev, lacpdu_mcast_addr);
> +
> + if (bond->params.broadcast_neighbor)
> + static_branch_inc(&bond_bcast_neigh_enabled);
> }
>
> if (bond_mode_can_use_xmit_hash(bond))
> @@ -4480,6 +4485,10 @@ static int bond_close(struct net_device *bond_dev)
> bond_alb_deinitialize(bond);
> bond->recv_probe = NULL;
>
> + if (BOND_MODE(bond) == BOND_MODE_8023AD &&
> + bond->params.broadcast_neighbor)
> + static_branch_dec(&bond_bcast_neigh_enabled);
What if the user enables BOND_OPT_BROADCAST_NEIGH and later switch the
bond mode? it looks like there will be bad accounting for static branch.
> +
> if (bond_uses_primary(bond)) {
> rcu_read_lock();
> slave = rcu_dereference(bond->curr_active_slave);
> @@ -5316,6 +5325,37 @@ static struct slave *bond_xdp_xmit_3ad_xor_slave_get(struct bonding *bond,
> return slaves->arr[hash % count];
> }
>
> +static bool bond_should_broadcast_neighbor(struct sk_buff *skb,
> + struct net_device *dev)
> +{
> + struct bonding *bond = netdev_priv(dev);
> + struct {
> + struct ipv6hdr ip6;
> + struct icmp6hdr icmp6;
> + } *combined, _combined;
> +
> + if (!static_branch_unlikely(&bond_bcast_neigh_enabled))
> + return false;
> +
> + if (!bond->params.broadcast_neighbor)
> + return false;
> +
> + if (skb->protocol == htons(ETH_P_ARP))
> + return true;
> +
> + if (skb->protocol == htons(ETH_P_IPV6)) {
> + combined = skb_header_pointer(skb, skb_mac_header_len(skb),
> + sizeof(_combined),
> + &_combined);
> + if (combined && combined->ip6.nexthdr == NEXTHDR_ICMP &&
> + (combined->icmp6.icmp6_type == NDISC_NEIGHBOUR_SOLICITATION ||
> + combined->icmp6.icmp6_type == NDISC_NEIGHBOUR_ADVERTISEMENT))
> + return true;
> + }
> +
> + return false;
> +}
> +
> /* Use this Xmit function for 3AD as well as XOR modes. The current
> * usable slave array is formed in the control path. The xmit function
> * just calculates hash and sends the packet out.
> @@ -5337,15 +5377,25 @@ static netdev_tx_t bond_3ad_xor_xmit(struct sk_buff *skb,
>
> /* in broadcast mode, we send everything to all usable interfaces. */
> static netdev_tx_t bond_xmit_broadcast(struct sk_buff *skb,
> - struct net_device *bond_dev)
> + struct net_device *bond_dev,
> + bool all_slaves)
> {
> struct bonding *bond = netdev_priv(bond_dev);
> - struct slave *slave = NULL;
> - struct list_head *iter;
> + struct bond_up_slave *slaves;
> bool xmit_suc = false;
> bool skb_used = false;
> + int slaves_count, i;
>
> - bond_for_each_slave_rcu(bond, slave, iter) {
> + rcu_read_lock();
Why this the RCU lock is now needed? AFAICS the caller scope does not
change. This is either not needed, or deserves a separate patch, with a
suitable fixes tag.
/P
Powered by blists - more mailing lists