[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20231017012024.pp2riikutr54ini4@House.clients.dxld.at>
Date: Tue, 17 Oct 2023 03:20:24 +0200
From: Daniel Gröber <dxld@...kboxed.org>
To: Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Jakub Kicinski <kuba@...nel.org>
Cc: "David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>,
netdev@...r.kernel.org, Richard Weinberger <richard@....at>,
Serge Hallyn <serge.hallyn@...onical.com>,
"Eric W. Biederman" <ebiederm@...ssion.com>
Subject: Re: [BUG] rtnl_newlink: Rogue MOVE event delivered on netns change
Hi Greg, Jackub,
On Mon, Oct 16, 2023 at 07:20:26PM +0200, Greg Kroah-Hartman wrote:
> > IIUC what happens is:
> >
> > - systemd controls "real" eth0
> > - we move a "to be renamed" eth0 from a container into main ns
> > - we rename "to be renamed" eth0 to something else
> > - seeing the rename of eth0 system thinks it's the "real" one
> > that is being renamed, ergo there's no eth0 any more,
> > so it shuts down its "unit" for eth0
> >
> > I don't think anything changed. Sounds more like someone finally tried
> > to use this in anger.
>
> Then they get to keep the broken pieces that they created here.
> "moving" a network connection to a container needs to either be added to
> systemd if it is going to manage the network connections, or just stop
> using systemd to handle the connection entirely as they want to do
> something that systemd doesn't support.
I think you've not entirely understood the problem, let me try to explain
it better. The buggy scenario is this: we have a netdev "eth0" in main-ns
(call it eth0-main to avoid confusion), managed by a systemd service (with
BoundTo=*-eth0.device) but also a different netdev called eth0 in a
container (call it eth0-netns).
Now the container is being shut down so the container manager dutifully
moves eth0-netns back to the main-ns while simultaniously renaming it so as
to not cash with eth0-main.
Systemd in main-ns now sees a MOVE event (eth0 -> eth123) and handles this
as the removal of eth0-main. This stops the corresponding device unit and
subsequently the service managing eth0-main via BoundTo.
Problem is the real eth0-main wasn't actually moved, removed or
otherwise. eth0-netns being rename+moved just caused a nonsense MOVE event
to be sent to main-ns (and an incorrect ADD with the old name).
You can imagine how this bug might bring down an entire container
hypervisor's network uplink.
On Fri, Oct 13, 2023 at 03:36:05PM -0700, Jakub Kicinski wrote:
> FTR I don't see the move on current net-next, I see two adds
> and one move.
Where's the second add? I see only one. Note: I only look at what happens
after the move+rename ip link set call.
> [ ~]# ip -netns test link set dev eth0 netns 1 name eth123
> KERNEL[135.598907] add /devices/virtual/net/eth0 (net)
> KERNEL[135.600425] move /devices/virtual/net/eth123 (net)
Testing again now my output matches yours above. Odd, but I probably did a
kernel upgrade since then. Let me know if it matters and I'll go digging
thought apt logs.
Now I have uname -a:
Linux 6.5.0-2-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.5.6-1 (2023-10-07) x86_64
> I don't think it matters for the problem you're describing, tho.
I thought the two move lines I was seeing were one move event being printed
as two lines, with the old/new device names. If what's being printed is the
new name the bug root cause stays the same.
> > Looking at the code in __rtnl_newlink I see do_setlink first calls
> > __dev_change_net_namespace and then dev_change_name. My guess is the order
> > is just wrong here.
>
> Interesting. My analysis is slightly different but only in low level
> aspects.. tell me if I'm wrong:
I only looked at the high level flow of the code.
> 1. we have tb[IFLA_IFNAME] set, so do_setlink() will populate ifname
>
> 2. Because of #1, __dev_change_net_namespace() gets called with
> new name provide (pat = eth123)
>
> 3. It will do netdev_name_in_use(), which returns true.
At this point we're still looking at the old netns, right?
> 4. It will then call dev_get_valid_name() which, (confusingly?) already
> sets the new name on the netdevice itself.
I wouldn't have guessed a "get" function will actually set something, no,
but then again I know not of the kernel's arcane ways :)
> Picking back up after the shutdown in old netns.
>
> 6. We call:
>
> kobject_uevent(&dev->dev.kobj, KOBJ_REMOVE);
> dev_net_set(dev, net);
> kobject_uevent(&dev->dev.kobj, KOBJ_ADD);
>
> Those are the calls you see in udev, recall that device core has
> its own naming, so both of those calls will use the _old_ name.
> REMOVE in the source netns and ADD in the destination netns.
>
> The kobject calls were added by Serge in 4e66ae2ea371c, in 2012,
> curiously it states:
>
> v2: also send KOBJ_ADD to new netns. There will then be a
> _MOVE event from the device_rename() call, but that should
> be innocuous.
>
> Which was based on this conversation:
> https://lore.kernel.org/all/20121012031328.GA5472@sergelap/
Seems to me what main-ns might do with this MOVE event was simply not
considered in detail, innocuous it is not ;)
> 7. Now we finally call:
>
> err = device_rename(&dev->dev, dev->name);
>
> Which tells device core that the name has changed, and gives you
> the (second) MOVE event. This time with the correct name.
I don't like loose ends. Any idea why we only see the one MOVE now?
> Which is what you're seeing, Bug#3, the ADD event should be after
> the call to device_rename()...
All tracks AFAICT.
Thanks,
--Daniel
Powered by blists - more mailing lists