netdev - Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADGSJ22VUgJzi6B=Bh4M6Bado1CQEEJvRR1VJ=oC47G2SJ0DEA@mail.gmail.com>
Date:   Fri, 2 Mar 2018 15:56:31 -0800
From:   Siwei Liu <loseweigh@...il.com>
To:     "Michael S. Tsirkin" <mst@...hat.com>
Cc:     Sridhar Samudrala <sridhar.samudrala@...el.com>,
        Stephen Hemminger <stephen@...workplumber.org>,
        David Miller <davem@...emloft.net>,
        Netdev <netdev@...r.kernel.org>, Jiri Pirko <jiri@...nulli.us>,
        virtio-dev@...ts.oasis-open.org,
        "Brandeburg, Jesse" <jesse.brandeburg@...el.com>,
        Alexander Duyck <alexander.h.duyck@...el.com>,
        Jakub Kicinski <kubakici@...pl>
Subject: Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available

On Fri, Mar 2, 2018 at 1:36 PM, Michael S. Tsirkin <mst@...hat.com> wrote:
> On Fri, Mar 02, 2018 at 01:11:56PM -0800, Siwei Liu wrote:
>> On Thu, Mar 1, 2018 at 12:08 PM, Sridhar Samudrala
>> <sridhar.samudrala@...el.com> wrote:
>> > This patch enables virtio_net to switch over to a VF datapath when a VF
>> > netdev is present with the same MAC address. It allows live migration
>> > of a VM with a direct attached VF without the need to setup a bond/team
>> > between a VF and virtio net device in the guest.
>> >
>> > The hypervisor needs to enable only one datapath at any time so that
>> > packets don't get looped back to the VM over the other datapath. When a VF
>> > is plugged, the virtio datapath link state can be marked as down. The
>> > hypervisor needs to unplug the VF device from the guest on the source host
>> > and reset the MAC filter of the VF to initiate failover of datapath to
>> > virtio before starting the migration. After the migration is completed,
>> > the destination hypervisor sets the MAC filter on the VF and plugs it back
>> > to the guest to switch over to VF datapath.
>> >
>> > When BACKUP feature is enabled, an additional netdev(bypass netdev) is
>> > created that acts as a master device and tracks the state of the 2 lower
>> > netdevs. The original virtio_net netdev is marked as 'backup' netdev and a
>> > passthru device with the same MAC is registered as 'active' netdev.
>> >
>> > This patch is based on the discussion initiated by Jesse on this thread.
>> > https://marc.info/?l=linux-virtualization&m=151189725224231&w=2
>> >
>> > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@...el.com>
>> > Signed-off-by: Alexander Duyck <alexander.h.duyck@...el.com>
>> > Reviewed-by: Jesse Brandeburg <jesse.brandeburg@...el.com>
>> > ---
>> >  drivers/net/virtio_net.c | 683 ++++++++++++++++++++++++++++++++++++++++++++++-
>> >  1 file changed, 682 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> > index bcd13fe906ca..f2860d86c952 100644
>> > --- a/drivers/net/virtio_net.c
>> > +++ b/drivers/net/virtio_net.c
>> > @@ -30,6 +30,8 @@
>> >  #include <linux/cpu.h>
>> >  #include <linux/average.h>
>> >  #include <linux/filter.h>
>> > +#include <linux/netdevice.h>
>> > +#include <linux/pci.h>
>> >  #include <net/route.h>
>> >  #include <net/xdp.h>
>> >
>> > @@ -206,6 +208,9 @@ struct virtnet_info {
>> >         u32 speed;
>> >
>> >         unsigned long guest_offloads;
>> > +
>> > +       /* upper netdev created when BACKUP feature enabled */
>> > +       struct net_device *bypass_netdev;
>> >  };
>> >
>> >  struct padded_vnet_hdr {
>> > @@ -2236,6 +2241,22 @@ static int virtnet_xdp(struct net_device *dev, struct netdev_bpf *xdp)
>> >         }
>> >  }
>> >
>> > +static int virtnet_get_phys_port_name(struct net_device *dev, char *buf,
>> > +                                     size_t len)
>> > +{
>> > +       struct virtnet_info *vi = netdev_priv(dev);
>> > +       int ret;
>> > +
>> > +       if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_BACKUP))
>> > +               return -EOPNOTSUPP;
>> > +
>> > +       ret = snprintf(buf, len, "_bkup");
>> > +       if (ret >= len)
>> > +               return -EOPNOTSUPP;
>> > +
>> > +       return 0;
>> > +}
>> > +
>>
>> What if the systemd/udevd is not new enough to enforce the
>> n<phys_port_name> naming? Would virtio_bypass get a different name
>> than the original virtio_net?
>
> You mean people using ethX names? Any hardware config change breaks
> these, I don't think that can be helped.

I don't like the way to rely on .ndo_get_phys_port_name - it's fragile
and it does not completely solve the problem it tries to address.
Imagine what can end up with if getting an old udevd, or users already
have exsiting explicit udev rules around phys_port_name. It does not
give you the an ack in saying "yes, I know you're the bypass and
you're the backup, please continue and I will give you both correct
names", or an unacknowlegment saying "no, I don't know what these
extra interfaces are, please go back and leave the VF device alone".
We need new udev API for both feature negotiation and naming, or may
even completely hide the lower interfaces.

>
>> Should we detect this earlier and fall
>> back to legacy mode without creating the bypass netdev and ensalving
>> the VF?
>
> I don't think we can do this with existing kernel/userspace APIs.

That's why I ever said to make udev aware of this new type of combined
device instead of doing hacks here and there around.

Regards,
-Siwei

>
> --
> MST