[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20180710084614.GA1612@redhat.com>
Date: Tue, 10 Jul 2018 09:46:14 +0100
From: Daniel P. Berrangé <berrange@...hat.com>
To: Jason Baron <jbaron@...mai.com>
Cc: "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"David S. Miller" <davem@...emloft.net>, libvir-list@...hat.com,
rmohr@...hat.com, Fabian Deutsch <fdeutsch@...hat.com>,
"Eric W. Biederman" <ebiederm@...ssion.com>
Subject: Re: opening tap devices that are created in a container
On Mon, Jul 09, 2018 at 04:56:04PM -0400, Jason Baron wrote:
>
>
> On 07/05/2018 12:10 PM, Daniel P. Berrangé wrote:
> > On Thu, Jul 05, 2018 at 10:20:16AM -0400, Jason Baron wrote:
> >> Hi,
> >>
> >> Opening tap devices, such as macvtap, that are created in containers is
> >> problematic because the interface for opening tap devices is via
> >> /dev/tapNN and devtmpfs is not typically mounted inside a container as
> >> its not namespace aware. It is possible to do a mknod() in the
> >> container, once the tap devices are created, however, since the tap
> >> devices are created dynamically its not possible to apriori allow access
> >> to certain major/minor numbers, since we don't know what these are going
> >> to be. In addition, its desirable to not allow the mknod capability in
> >> containers. This behavior, I think is somewhat inconsistent with the
> >> tuntap driver where one can create tuntap devices inside a container by
> >> first opening /dev/net/tun and then using them by supplying the tuntap
> >> device name via the ioctl(TUNSETIFF). And since TUNSETIFF validates the
> >> network namespace, one is limited to opening network devices that belong
> >> to your current network namespace.
> >>
> >> Here are some options to this issue, that I wanted to get feedback
> >> about, and just wondering if anybody else has run into this.
> >>
> >> 1)
> >>
> >> Don't create the tap device, such as macvtap in the container. Instead,
> >> create the tap device outside of the container and then move it into the
> >> desired container network namespace. In addition, do a mknod() for the
> >> corresponding /dev/tapNN device from outside the container before doing
> >> chroot().
> >>
> >> This solution still doesn't allow tap devices to be created inside the
> >> container. Thus, in the case of kubevirt, which runs libvirtd inside of
> >> a container, it would mean changing libvirtd to open existing tap
> >> devices (as opposed to the current behavior of creating new ones). This
> >> would not require any kernel changes, but as mentioned seems
> >> inconsistent with the tuntap interface.
> >
> > Presumably the /dev/tapNN device name also changes when you rename
> > the tap device interface using SIOCSIFNAME ?
> >
>
> I don't think so. the NN is the ifindex of the device- changing the
> device name does not affect the ifindex.
Ah right that makes sense.
> > eg if it was /dev/tap24 in the host and you called SIOCSIFNAME(eth0)
> > when moving it into the container, it would be /dev/eth0 inside the
> > container ?
> >
>
> When moving it into the container the ifindex can change since the
> ifindex range is per-namespace (not global).
Oh thats interesting, I hadn't realized that.
> > Anyway, given that this /dev/tapNN approach is what exists today,
> > libvirt will likely want to implement support for this regardless
> > in order to support existing kernels.
>
> Ok, in this case whatever created the tap device outside of the
> container would pass the name of the device to libvirt and make sure
> that the /dev/tapNN device was setup correctly in the container. I
> believe this differs from how libvirt works today in that libvirt would
> need to be modified to open an existing device (I think it currently
> always creates new ones).
Libvirt can use a pre-created TAP device today, but not a pre-created
MACVTAP, so supporting the latter is new code for us no matter what.
> > One slight complication with either of the solutions above is that
> > libvirt won't know whether it is given a TAP or a MACVTAP device.
> > It'll only be given the device name. So with code today we would
> > probably have to first try /dev/tapNNN and if that doesn't exist
> > then try /dev/net/tun with TUNSETIFF.
> >
>
> hmmm. doesn't libvirt make this distinction today?
No need to make the distinction yet, since we only support pre-created
TAP devices right now. In cases where we create the devices ourselves,
we already know what is what.
> > If adding a new /dev/net/tap, something could seemlessy accept
> > either a TAP or MACTAP nic name would be nice.
> >
> >
>
> I think if we added a new ioctl() as I proposed it could accept either
> type of nic.
ok that would be nice.
Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
Powered by blists - more mailing lists