[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20231013153605.487f5a74@kernel.org>
Date: Fri, 13 Oct 2023 15:36:05 -0700
From: Jakub Kicinski <kuba@...nel.org>
To: Daniel Gröber <dxld@...kboxed.org>
Cc: "David S. Miller" <davem@...emloft.net>, Eric Dumazet
<edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>,
netdev@...r.kernel.org, Richard Weinberger <richard@....at>, Serge Hallyn
<serge.hallyn@...onical.com>, "Eric W. Biederman" <ebiederm@...ssion.com>
Subject: Re: [BUG] rtnl_newlink: Rogue MOVE event delivered on netns change
On Tue, 10 Oct 2023 14:10:03 +0200 Daniel Gröber wrote:
> Changing a device's netns and renaming it with one RTM_NEWLINK call causes
> a rogue MOVE uevent to be delivered to the new netns in addition to the
> expected ADD uevent.
>
> iproute2 reproducer:
>
> $ ip netns add test
> $ ip link add dev eth0 netns test type dummy
> $ ip link add dev eth0 type dummy
>
> $ ip -netns test link set dev eth0 netns 1 name eth123
>
> With the last command, which renames the device and moves it out of the
> netns, we get the following:
>
> $ udevadm monitor -k
> KERNEL[230953.424834] add /devices/virtual/net/eth0 (net)
> KERNEL[230953.424914] move /devices/virtual/net/eth0 (net)
> KERNEL[230953.425066] move /devices/virtual/net/eth123 (net)
FTR I don't see the move on current net-next, I see two adds
and one move.
[ ~]# udevadm monitor -k &
monitor will print the received events for:
KERNEL - the kernel uevent
[ ~]# ip netns add test
[ ~]# ip link add dev eth0 netns test type dummy
KERNEL[115.393650] add /module/dummy (module)
[ ~]# ip link add dev eth0 type dummy
KERNEL[121.702300] add /devices/virtual/net/eth0 (net)
KERNEL[121.704608] add /devices/virtual/net/eth0/queues/rx-0 (queues)
KERNEL[121.704733] add /devices/virtual/net/eth0/queues/tx-0 (queues)
[ ~]# ip -netns test link set dev eth0 netns 1 name eth123
KERNEL[135.598907] add /devices/virtual/net/eth0 (net)
KERNEL[135.600425] move /devices/virtual/net/eth123 (net)
I don't think it matters for the problem you're describing, tho.
> The problem is the MOVE event hooribly confuses userspace. The particular
> symptom we're seing is that systemd will bring down the ifup@...0.service
> on the host as it handles the MOVE of eth0->eth123 as a stop for the
> BoundTo sys-subsystem-net-devices-eth0.device unit.
>
> I also create a clashing eth0 device on the host in the repro to
> demonstrate that the RTM_NETLINK move+rename call is atomic and so the MOVE
> event is entirely nonsensical.
>
> Looking at the code in __rtnl_newlink I see do_setlink first calls
> __dev_change_net_namespace and then dev_change_name. My guess is the order
> is just wrong here.
Interesting. My analysis is slightly different but only in low level
aspects.. tell me if I'm wrong:
1. we have tb[IFLA_IFNAME] set, so do_setlink() will populate ifname
2. Because of #1, __dev_change_net_namespace() gets called with
new name provide (pat = eth123)
3. It will do netdev_name_in_use(), which returns true.
4. It will then call dev_get_valid_name() which, (confusingly?) already
sets the new name on the netdevice itself.
5. It then calls device shutdown from the source netns.
This results in Bug#1, the netlink notification carries
the new name in the old netns.
[ ~]# ip -netns test link add dev eth0 netns test type dummy
[ ~]# ip -netns test link add dev eth1 netns test type dummy
[ ~]# ip -netns test link set dev eth0 netns 1 name eth1
ip monitor inside netns:
5: eth0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
link/ether be:4d:58:f9:d5:40 brd ff:ff:ff:ff:ff:ff
6: eth1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
link/ether 1e:4a:34:36:e3:cd brd ff:ff:ff:ff:ff:ff
Deleted inet eth0
Deleted inet6 eth0
Deleted 5: eth1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
link/ether be:4d:58:f9:d5:40 brd ff:ff:ff:ff:ff:ff new-netnsid 0 new-ifindex 7
tells us we deleted eth1, ifindex 5, which is not true. It was eth0.
Small sidebar - altnames are completely broken when it comes to netns,
too:
[ ~]# ip link add dev eth0 type dummy
[ ~]# ip link add dev eth1 type dummy
[ ~]# ip link property add dev eth1 altname eth0
RTNETLINK answers: File exists
^ it's not letting us use eth0 because device eth0 exists, but
[ ~]# ip netns add test
[ ~]# ip -netns test link add dev ethX netns test type dummy
[ ~]# ip -netns test link property add dev ethX altname eth0
[ ~]# ip -netns test link set dev ethX netns 1
[ ~]# ip li
...
3: eth0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 02:40:88:62:ec:b8 brd ff:ff:ff:ff:ff:ff
...
5: ethX: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 26:b7:28:78:38:0f brd ff:ff:ff:ff:ff:ff
altname eth0
and we have eth0 altname. So that's Bug#2.
Picking back up after the shutdown in old netns.
6. We call:
kobject_uevent(&dev->dev.kobj, KOBJ_REMOVE);
dev_net_set(dev, net);
kobject_uevent(&dev->dev.kobj, KOBJ_ADD);
Those are the calls you see in udev, recall that device core has
its own naming, so both of those calls will use the _old_ name.
REMOVE in the source netns and ADD in the destination netns.
The kobject calls were added by Serge in 4e66ae2ea371c, in 2012,
curiously it states:
v2: also send KOBJ_ADD to new netns. There will then be a
_MOVE event from the device_rename() call, but that should
be innocuous.
Which was based on this conversation:
https://lore.kernel.org/all/20121012031328.GA5472@sergelap/
7. Now we finally call:
err = device_rename(&dev->dev, dev->name);
Which tells device core that the name has changed, and gives you
the (second) MOVE event. This time with the correct name.
Which is what you're seeing, Bug#3, the ADD event should be after
the call to device_rename()...
Bug#1 and Bug#2 we can fix in networking. Bug#3 is a bit more tricky,
because what we want is a "silent" rename, without generating the MOVE.
This email is a bit long, so let me cut off here..
Powered by blists - more mailing lists