netdev - Re: opening tap devices that are created in a container

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Tue, 10 Jul 2018 09:46:14 +0100
From:   Daniel P. Berrangé <berrange@...hat.com>
To:     Jason Baron <jbaron@...mai.com>
Cc:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "David S. Miller" <davem@...emloft.net>, libvir-list@...hat.com,
        rmohr@...hat.com, Fabian Deutsch <fdeutsch@...hat.com>,
        "Eric W. Biederman" <ebiederm@...ssion.com>
Subject: Re: opening tap devices that are created in a container

On Mon, Jul 09, 2018 at 04:56:04PM -0400, Jason Baron wrote:
> 
> 
> On 07/05/2018 12:10 PM, Daniel P. Berrangé wrote:
> > On Thu, Jul 05, 2018 at 10:20:16AM -0400, Jason Baron wrote:
> >> Hi,
> >>
> >> Opening tap devices, such as macvtap, that are created in containers is
> >> problematic because the interface for opening tap devices is via
> >> /dev/tapNN and devtmpfs is not typically mounted inside a container as
> >> its not namespace aware. It is possible to do a mknod() in the
> >> container, once the tap devices are created, however, since the tap
> >> devices are created dynamically its not possible to apriori allow access
> >> to certain major/minor numbers, since we don't know what these are going
> >> to be. In addition, its desirable to not allow the mknod capability in
> >> containers. This behavior, I think is somewhat inconsistent with the
> >> tuntap driver where one can create tuntap devices inside a container by
> >> first opening /dev/net/tun and then using them by supplying the tuntap
> >> device name via the ioctl(TUNSETIFF). And since TUNSETIFF validates the
> >> network namespace, one is limited to opening network devices that belong
> >> to your current network namespace.
> >>
> >> Here are some options to this issue, that I wanted to get feedback
> >> about, and just wondering if anybody else has run into this.
> >>
> >> 1)
> >>
> >> Don't create the tap device, such as macvtap in the container. Instead,
> >> create the tap device outside of the container and then move it into the
> >> desired container network namespace. In addition, do a mknod() for the
> >> corresponding /dev/tapNN device from outside the container before doing
> >> chroot().
> >>
> >> This solution still doesn't allow tap devices to be created inside the
> >> container. Thus, in the case of kubevirt, which runs libvirtd inside of
> >> a container, it would mean changing libvirtd to open existing tap
> >> devices (as opposed to the current behavior of creating new ones). This
> >> would not require any kernel changes, but as mentioned seems
> >> inconsistent with the tuntap interface.
> > 
> > Presumably the /dev/tapNN  device name also changes when you rename
> > the tap device interface using SIOCSIFNAME ?
> > 
> 
> I don't think so. the NN is the ifindex of the device- changing the
> device name does not affect the ifindex.

Ah right that makes sense. 

> > eg if it was /dev/tap24 in the host and you called SIOCSIFNAME(eth0)
> > when moving it into the container, it would be /dev/eth0 inside the
> > container ?
> > 
> 
> When moving it into the container the ifindex can change since the
> ifindex range is per-namespace (not global).

Oh thats interesting, I hadn't realized that.

> > Anyway, given that this /dev/tapNN approach is what exists today,
> > libvirt will likely want to implement support for this regardless
> > in order to support existing kernels.
> 
> Ok, in this case whatever created the tap device outside of the
> container would pass the name of the device to libvirt and make sure
> that the /dev/tapNN device was setup correctly in the container. I
> believe this differs from how libvirt works today in that libvirt would
> need to be modified to open an existing device (I think it currently
> always creates new ones).

Libvirt can use a pre-created TAP device today, but not a pre-created
MACVTAP, so supporting the latter is new code for us no matter what.

> > One slight complication with either of the solutions above is that
> > libvirt won't know whether it is given a TAP or a MACVTAP device.
> > It'll only be given the device name. So with code today we would
> > probably have to first try /dev/tapNNN and if that doesn't exist
> > then try /dev/net/tun with TUNSETIFF.
> >
> 
> hmmm. doesn't libvirt make this distinction today?

No need to make the distinction yet, since we only support pre-created
TAP devices right now. In cases where we create the devices ourselves,
we already know what is what.

> > If adding a new /dev/net/tap, something could seemlessy accept
> > either a TAP or MACTAP nic name would be nice.
> > 
> >
> 
> I think if we added a new ioctl() as I proposed it could accept either
> type of nic.

ok that would be nice.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|