netdev - Re: [BUG] rtnl_newlink: Rogue MOVE event delivered on netns change

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20231017012024.pp2riikutr54ini4@House.clients.dxld.at>
Date: Tue, 17 Oct 2023 03:20:24 +0200
From: Daniel Gröber <dxld@...kboxed.org>
To: Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	Jakub Kicinski <kuba@...nel.org>
Cc: "David S. Miller" <davem@...emloft.net>,
	Eric Dumazet <edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>,
	netdev@...r.kernel.org, Richard Weinberger <richard@....at>,
	Serge Hallyn <serge.hallyn@...onical.com>,
	"Eric W. Biederman" <ebiederm@...ssion.com>
Subject: Re: [BUG] rtnl_newlink: Rogue MOVE event delivered on netns change

Hi Greg, Jackub,

On Mon, Oct 16, 2023 at 07:20:26PM +0200, Greg Kroah-Hartman wrote:
> > IIUC what happens is:
> > 
> >  - systemd controls "real" eth0
> >  - we move a "to be renamed" eth0 from a container into main ns
> >  - we rename "to be renamed" eth0 to something else
> >  - seeing the rename of eth0 system thinks it's the "real" one
> >    that is being renamed, ergo there's no eth0 any more,
> >    so it shuts down its "unit" for eth0
> > 
> > I don't think anything changed. Sounds more like someone finally tried
> > to use this in anger.
> 
> Then they get to keep the broken pieces that they created here.
> "moving" a network connection to a container needs to either be added to
> systemd if it is going to manage the network connections, or just stop
> using systemd to handle the connection entirely as they want to do
> something that systemd doesn't support.

I think you've not entirely understood the problem, let me try to explain
it better. The buggy scenario is this: we have a netdev "eth0" in main-ns
(call it eth0-main to avoid confusion), managed by a systemd service (with
BoundTo=*-eth0.device) but also a different netdev called eth0 in a
container (call it eth0-netns).

Now the container is being shut down so the container manager dutifully
moves eth0-netns back to the main-ns while simultaniously renaming it so as
to not cash with eth0-main.

Systemd in main-ns now sees a MOVE event (eth0 -> eth123) and handles this
as the removal of eth0-main. This stops the corresponding device unit and
subsequently the service managing eth0-main via BoundTo.

Problem is the real eth0-main wasn't actually moved, removed or
otherwise. eth0-netns being rename+moved just caused a nonsense MOVE event
to be sent to main-ns (and an incorrect ADD with the old name).

You can imagine how this bug might bring down an entire container
hypervisor's network uplink.



On Fri, Oct 13, 2023 at 03:36:05PM -0700, Jakub Kicinski wrote:
> FTR I don't see the move on current net-next, I see two adds 
> and one move.

Where's the second add? I see only one. Note: I only look at what happens
after the move+rename ip link set call.

>  [ ~]# ip -netns test link set dev eth0 netns 1 name eth123
>  KERNEL[135.598907] add      /devices/virtual/net/eth0 (net)
>  KERNEL[135.600425] move     /devices/virtual/net/eth123 (net)

Testing again now my output matches yours above. Odd, but I probably did a
kernel upgrade since then. Let me know if it matters and I'll go digging
thought apt logs.

Now I have uname -a:
Linux 6.5.0-2-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.5.6-1 (2023-10-07) x86_64

> I don't think it matters for the problem you're describing, tho.

I thought the two move lines I was seeing were one move event being printed
as two lines, with the old/new device names. If what's being printed is the
new name the bug root cause stays the same.

> > Looking at the code in __rtnl_newlink I see do_setlink first calls
> > __dev_change_net_namespace and then dev_change_name. My guess is the order
> > is just wrong here.
> 
> Interesting. My analysis is slightly different but only in low level
> aspects.. tell me if I'm wrong:

I only looked at the high level flow of the code.

> 1. we have tb[IFLA_IFNAME] set, so do_setlink() will populate ifname
> 
> 2. Because of #1, __dev_change_net_namespace() gets called with 
>    new name provide (pat = eth123)
> 
> 3. It will do netdev_name_in_use(), which returns true.

At this point we're still looking at the old netns, right?

> 4. It will then call dev_get_valid_name() which, (confusingly?) already
>    sets the new name on the netdevice itself.

I wouldn't have guessed a "get" function will actually set something, no,
but then again I know not of the kernel's arcane ways :)

> Picking back up after the shutdown in old netns.
> 
> 6. We call:
> 
>    kobject_uevent(&dev->dev.kobj, KOBJ_REMOVE);
>    dev_net_set(dev, net);
>    kobject_uevent(&dev->dev.kobj, KOBJ_ADD);
> 
> Those are the calls you see in udev, recall that device core has
> its own naming, so both of those calls will use the _old_ name.
> REMOVE in the source netns and ADD in the destination netns.
>
> The kobject calls were added by Serge in 4e66ae2ea371c, in 2012,
> curiously it states:
> 
>     v2: also send KOBJ_ADD to new netns.  There will then be a
>     _MOVE event from the device_rename() call, but that should
>     be innocuous.
>
> Which was based on this conversation:
> https://lore.kernel.org/all/20121012031328.GA5472@sergelap/

Seems to me what main-ns might do with this MOVE event was simply not
considered in detail, innocuous it is not ;)

> 7. Now we finally call:
> 
>    err = device_rename(&dev->dev, dev->name);
> 
> Which tells device core that the name has changed, and gives you 
> the (second) MOVE event. This time with the correct name.

I don't like loose ends. Any idea why we only see the one MOVE now?

> Which is what you're seeing, Bug#3, the ADD event should be after
> the call to device_rename()...

All tracks AFAICT.

Thanks,
--Daniel