[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <1604303803-30660-1-git-send-email-i@liuyulong.me>
Date: Mon, 2 Nov 2020 15:56:43 +0800
From: LIU Yulong <liuyulong.xa@...il.com>
To: netdev@...r.kernel.org
Cc: LIU Yulong <i@...yulong.me>
Subject: [PATCH v2] net: bonding: alb disable balance for IPv6 multicast related mac
According to the RFC 2464 [1] the prefix "33:33:xx:xx:xx:xx" is defined to
construct the multicast destination MAC address for IPv6 multicast traffic.
The NDP (Neighbor Discovery Protocol for IPv6)[2] will comply with such
rule. The work steps [6] are:
*) Let's assume a destination address of 2001:db8:1:1::1.
*) This is mapped into the "Solicited Node Multicast Address" (SNMA)
format of ff02::1:ffXX:XXXX.
*) The XX:XXXX represent the last 24 bits of the SNMA, and are derived
directly from the last 24 bits of the destination address.
*) Resulting in a SNMA ff02::1:ff00:0001, or ff02::1:ff00:1.
*) This, being a multicast address, can be mapped to a multicast MAC
address, using the format 33-33-XX-XX-XX-XX
*) Resulting in 33-33-ff-00-00-01.
*) This is a MAC address that is only being listened for by nodes
sharing the same last 24 bits.
*) In other words, while there is a chance for a "address collision",
it is a vast improvement over ARP's guaranteed "collision".
Kernel related code can be found at [3][4][5].
The current bond alb has some leaks of such MAC ranges which will cause
the physical world failed to determain the back tunnel of the reply
packet during the response in a Spine-and-Leaf data center architecture.
The basic topology looks like this:
+-------------+
+---| Border Leaf |-----+
tunnel-1| +-------------+ | tunnel-2
| |
+---+----+ +------+-+
| Leaf1 +-----X-----+ Leaf2 | tunnel-3 has loop avoidance
+--------+ tunnel-3 +-+------+
| |
+----+ +----+
+--+nic1+---+nic2+---+
| +----+ +----+ |
| bond6 |
| HOST |
+--------------------+
When nic1 is sending the normal IPv6 traffic to the gateway in Border leaf,
the nic2 (slave) will send the NS packet out periodically, automatically
and implicitly as well. This is an example packet sending from the slave
nic2 which will broke the traffic.
ac:1f:6b:90:5c:eb > 33:33:ff:00:00:01, ethertype 802.1Q (0x8100),
length 90: vlan 205, p 0, ethertype IPv6, (hlim 255,
next-header ICMPv6 (58) payload length: 32)
fe80::f816:3eff:feba:2d8c > ff02::1:ff00:1:
[icmp6 sum ok] ICMP6, neighbor solicitation, length 32,
who has 240e:980:2f00:4000::1
source link-address option (1), length 8 (1): fa:16:3e:ba:2d:8c
The packet source MAC "ac:1f:6b:90:5c:eb" was the nic2 MAC whose original
value should be "fa:16:3e:ba:2d:8c", but it was changed by alb related
MAC address mechanism [8].
MAC "fa:16:3e:ba:2d:8c" was the virtual device MAC from a cloud service
inside a kernel network namespace, the topology is here [7].
MAC "fa:16:3e:ba:2d:8c" was first learnt at Leaf1 based on the underlay
mechanism(BGP EVPN). When this example packet was sent to Border leaf and
replied with dst_mac "fa:16:3e:ba:2d:8c", Leaf2 will try to send packet
back to tunnel-3 at this point dropping happens because of the loop
defense. All the original normal IPv6 traffic will be lead to the tunnel-2
and then drop. Link is broken now.
This patch addresses such issue by check the entire MAC range definde by
the RFC 2464. Adding a new helper method to check the first two octets
are the value 3333. If the dest MAC is matched, no balance will be
enabled.
[1] https://tools.ietf.org/html/rfc2464#section-7
[2] https://tools.ietf.org/html/rfc4861
[3] linux.git/tree/include/net/if_inet6.h#n209-n221
[4] linux.git/tree/net/ipv6/ndisc.c#n291
[5] linux.git/tree/net/ipv6/ndisc.c#n346-n348
[6] https://en.citizendium.org/wiki/Neighbor_Discovery
[7] https://docs.openstack.org/neutron/latest/admin/deploy-ovs-selfservice.html#architecture
[8] linux.git/tree/drivers/net/bonding/bond_alb.c#n1320
Signed-off-by: LIU Yulong <i@...yulong.me>
---
drivers/net/bonding/bond_alb.c | 8 ++------
include/linux/etherdevice.h | 12 ++++++++++++
2 files changed, 14 insertions(+), 6 deletions(-)
diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
index c3091e0..eda9046 100644
--- a/drivers/net/bonding/bond_alb.c
+++ b/drivers/net/bonding/bond_alb.c
@@ -24,9 +24,6 @@
#include <net/bonding.h>
#include <net/bond_alb.h>
-static const u8 mac_v6_allmcast[ETH_ALEN + 2] __long_aligned = {
- 0x33, 0x33, 0x00, 0x00, 0x00, 0x01
-};
static const int alb_delta_in_ticks = HZ / ALB_TIMER_TICKS_PER_SEC;
#pragma pack(1)
@@ -1425,10 +1422,9 @@ struct slave *bond_xmit_alb_slave_get(struct bonding *bond,
break;
}
- /* IPv6 uses all-nodes multicast as an equivalent to
- * broadcasts in IPv4.
+ /* IPv6 multicast destinations should not be tx-balanced.
*/
- if (ether_addr_equal_64bits(eth_data->h_dest, mac_v6_allmcast)) {
+ if (is_ipv6_multicast_ether_addr(eth_data->h_dest)) {
do_tx_balance = false;
break;
}
diff --git a/include/linux/etherdevice.h b/include/linux/etherdevice.h
index 2e5debc..ac74a99 100644
--- a/include/linux/etherdevice.h
+++ b/include/linux/etherdevice.h
@@ -178,6 +178,18 @@ static inline bool is_unicast_ether_addr(const u8 *addr)
}
/**
+ * is_ipv6_multicast_ether_addr - Determine if the Ethernet address is for
+ * IPv6 multicast (rfc2464).
+ * @addr: Pointer to a six-byte array containing the Ethernet address
+ *
+ * Return true if the address is a multicast for IPv6.
+ */
+static inline bool is_ipv6_multicast_ether_addr(const u8 *addr)
+{
+ return (addr[0] == 0x33) && (addr[1] == 0x33);
+}
+
+/**
* is_valid_ether_addr - Determine if the given Ethernet address is valid
* @addr: Pointer to a six-byte array containing the Ethernet address
*
--
1.8.3.1
Powered by blists - more mailing lists