[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <056c8f4765a52179630b904e95fc4e3f26c02f2a.1739548836.git.petrm@nvidia.com>
Date: Fri, 14 Feb 2025 17:18:21 +0100
From: Petr Machata <petrm@...dia.com>
To: "David S. Miller" <davem@...emloft.net>, Eric Dumazet
<edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>, Paolo Abeni
<pabeni@...hat.com>, Simon Horman <horms@...nel.org>,
<netdev@...r.kernel.org>
CC: Ido Schimmel <idosch@...dia.com>, Petr Machata <petrm@...dia.com>,
<mlxsw@...dia.com>, Andrew Lunn <andrew+netdev@...n.ch>, Nikolay Aleksandrov
<razor@...ckwall.org>, Roopa Prabhu <roopa@...dia.com>, Menglong Dong
<menglong8.dong@...il.com>, Guillaume Nault <gnault@...hat.com>
Subject: [PATCH net-next v2 2/5] vxlan: Join / leave MC group after remote changes
When a vxlan netdevice is brought up, if its default remote is a multicast
address, the device joins the indicated group.
Therefore when the multicast remote address changes, the device should
leave the current group and subscribe to the new one. Similarly when the
interface used for endpoint communication is changed in a situation when
multicast remote is configured. This is currently not done.
Both vxlan_igmp_join() and vxlan_igmp_leave() can however fail. So it is
possible that with such fix, the netdevice will end up in an inconsistent
situation where the old group is not joined anymore, but joining the new
group fails. Should we join the new group first, and leave the old one
second, we might end up in the opposite situation, where both groups are
joined. Undoing any of this during rollback is going to be similarly
problematic.
One solution would be to just forbid the change when the netdevice is up.
However in vnifilter mode, changing the group address is allowed, and these
problems are simply ignored (see vxlan_vni_update_group()):
# ip link add name br up type bridge vlan_filtering 1
# ip link add vx1 up master br type vxlan external vnifilter local 192.0.2.1 dev lo dstport 4789
# bridge vni add dev vx1 vni 200 group 224.0.0.1
# tcpdump -i lo &
# bridge vni add dev vx1 vni 200 group 224.0.0.2
18:55:46.523438 IP 0.0.0.0 > 224.0.0.22: igmp v3 report, 1 group record(s)
18:55:46.943447 IP 0.0.0.0 > 224.0.0.22: igmp v3 report, 1 group record(s)
# bridge vni
dev vni group/remote
vx1 200 224.0.0.2
Having two different modes of operation for conceptually the same interface
is silly, so in this patch, just do what the vnifilter code does and deal
with the errors by crossing fingers real hard.
The vnifilter code leaves old before joining new, and in case of join /
leave failures does not roll back the configuration changes that have
already been applied, but bails out of joining if it could not leave. Do
the same here: leave before join, apply changes unconditionally and do not
attempt to join if we couldn't leave.
Signed-off-by: Petr Machata <petrm@...dia.com>
Reviewed-by: Ido Schimmel <idosch@...dia.com>
---
Notes:
v2:
- Adjust the code so that it is closer to vnifilter.
Expand the commit message the explain in detail
which aspects of vnifilter code were emulated.
---
CC: Andrew Lunn <andrew+netdev@...n.ch>
CC: Nikolay Aleksandrov <razor@...ckwall.org>
CC: Roopa Prabhu <roopa@...dia.com>
CC: Menglong Dong <menglong8.dong@...il.com>
CC: Guillaume Nault <gnault@...hat.com>
drivers/net/vxlan/vxlan_core.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)
diff --git a/drivers/net/vxlan/vxlan_core.c b/drivers/net/vxlan/vxlan_core.c
index ec0aee1d5b91..588ab2c16c67 100644
--- a/drivers/net/vxlan/vxlan_core.c
+++ b/drivers/net/vxlan/vxlan_core.c
@@ -4419,6 +4419,7 @@ static int vxlan_changelink(struct net_device *dev, struct nlattr *tb[],
struct netlink_ext_ack *extack)
{
struct vxlan_dev *vxlan = netdev_priv(dev);
+ bool rem_ip_changed, change_igmp;
struct net_device *lowerdev;
struct vxlan_config conf;
struct vxlan_rdst *dst;
@@ -4442,8 +4443,13 @@ static int vxlan_changelink(struct net_device *dev, struct nlattr *tb[],
if (err)
return err;
+ rem_ip_changed = !vxlan_addr_equal(&conf.remote_ip, &dst->remote_ip);
+ change_igmp = vxlan->dev->flags & IFF_UP &&
+ (rem_ip_changed ||
+ dst->remote_ifindex != conf.remote_ifindex);
+
/* handle default dst entry */
- if (!vxlan_addr_equal(&conf.remote_ip, &dst->remote_ip)) {
+ if (rem_ip_changed) {
u32 hash_index = fdb_head_index(vxlan, all_zeros_mac, conf.vni);
spin_lock_bh(&vxlan->hash_lock[hash_index]);
@@ -4487,6 +4493,9 @@ static int vxlan_changelink(struct net_device *dev, struct nlattr *tb[],
}
}
+ if (change_igmp && vxlan_addr_multicast(&dst->remote_ip))
+ err = vxlan_multicast_leave(vxlan);
+
if (conf.age_interval != vxlan->cfg.age_interval)
mod_timer(&vxlan->age_timer, jiffies);
@@ -4494,7 +4503,12 @@ static int vxlan_changelink(struct net_device *dev, struct nlattr *tb[],
if (lowerdev && lowerdev != dst->remote_dev)
dst->remote_dev = lowerdev;
vxlan_config_apply(dev, &conf, lowerdev, vxlan->net, true);
- return 0;
+
+ if (!err && change_igmp &&
+ vxlan_addr_multicast(&dst->remote_ip))
+ err = vxlan_multicast_join(vxlan);
+
+ return err;
}
static void vxlan_dellink(struct net_device *dev, struct list_head *head)
--
2.47.0
Powered by blists - more mailing lists