netdev - Re: [PATCH net] failover: eliminate callback hell

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180608170235.7f345b58@xeon-e3>
Date:   Fri, 8 Jun 2018 17:02:35 -0700
From:   Stephen Hemminger <stephen@...workplumber.org>
To:     Siwei Liu <loseweigh@...il.com>
Cc:     "Michael S. Tsirkin" <mst@...hat.com>,
        Jiri Pirko <jiri@...nulli.us>, kys@...rosoft.com,
        haiyangz@...rosoft.com, David Miller <davem@...emloft.net>,
        "Samudrala, Sridhar" <sridhar.samudrala@...el.com>,
        Netdev <netdev@...r.kernel.org>,
        Stephen Hemminger <sthemmin@...rosoft.com>
Subject: Re: [PATCH net] failover: eliminate callback hell

On Fri, 8 Jun 2018 16:44:12 -0700
Siwei Liu <loseweigh@...il.com> wrote:

> On Fri, Jun 8, 2018 at 4:18 PM, Stephen Hemminger
> <stephen@...workplumber.org> wrote:
> > On Fri, 8 Jun 2018 15:25:59 -0700
> > Siwei Liu <loseweigh@...il.com> wrote:
> >  
> >> On Wed, Jun 6, 2018 at 2:24 PM, Stephen Hemminger
> >> <stephen@...workplumber.org> wrote:  
> >> > On Wed, 6 Jun 2018 15:30:27 +0300
> >> > "Michael S. Tsirkin" <mst@...hat.com> wrote:
> >> >  
> >> >> On Wed, Jun 06, 2018 at 09:25:12AM +0200, Jiri Pirko wrote:  
> >> >> > Tue, Jun 05, 2018 at 05:42:31AM CEST, stephen@...workplumber.org wrote:  
> >> >> > >The net failover should be a simple library, not a virtual
> >> >> > >object with function callbacks (see callback hell).  
> >> >> >
> >> >> > Why just a library? It should do a common things. I think it should be a
> >> >> > virtual object. Looks like your patch again splits the common
> >> >> > functionality into multiple drivers. That is kind of backwards attitude.
> >> >> > I don't get it. We should rather focus on fixing the mess the
> >> >> > introduction of netvsc-bonding caused and switch netvsc to 3-netdev
> >> >> > model.  
> >> >>
> >> >> So it seems that at least one benefit for netvsc would be better
> >> >> handling of renames.
> >> >>
> >> >> Question is how can this change to 3-netdev happen?  Stephen is
> >> >> concerned about risk of breaking some userspace.
> >> >>
> >> >> Stephen, this seems to be the usecase that IFF_HIDDEN was trying to
> >> >> address, and you said then "why not use existing network namespaces
> >> >> rather than inventing a new abstraction". So how about it then? Do you
> >> >> want to find a way to use namespaces to hide the PV device for netvsc
> >> >> compatibility?
> >> >>  
> >> >
> >> > Netvsc can't work with 3 dev model. MS has worked with enough distro's and
> >> > startups that all demand eth0 always be present. And VF may come and go.
> >> > After this history, there is a strong motivation not to change how kernel
> >> > behaves. Switching to 3 device model would be perceived as breaking
> >> > existing userspace.
> >> >
> >> > With virtio you can  work it out with the distro's yourself.
> >> > There is no pre-existing semantics to deal with.
> >> >
> >> > For the virtio, I don't see the need for IFF_HIDDEN.  
> >>
> >> I have a somewhat different view regarding IFF_HIDDEN. The purpose of
> >> that flag, as well as the 1-netdev model, is to have a means to
> >> inherit the interface name from the VF, and to eliminate playing hacks
> >> around renaming devices, customizing udev rules and et al. Why
> >> inheriting VF's name important? To allow existing config/setup around
> >> VF continues to work across kernel feature upgrade. Most of network
> >> config files in all distros are based on interface names. Few are MAC
> >> address based but making lower slaves hidden would cover the rest. And
> >> most importantly, preserving the same level of user experience as
> >> using raw VF interface once getting all ndo_ops and ethtool_ops
> >> exposed. This is essential to realize transparent live migration that
> >> users dont have to learn and be aware of the undertaken.  
> >
> > Inheriting the VF name will fail in the migration scenario.
> > It is perfectly reasonable to migrate a guest to another machine where
> > the VF PCI address is different. And since current udev/systemd model
> > is to base network device name off of PCI address, the device will change
> > name when guest is migrated.
> >  
> The scenario of having VF on a different PCI address on post migration
> is essentially equal to plugging in a new NIC. Why it has to pair with
> the original PV? A sepearte PV device should be in place to pair the
> new VF.

The host only guarantees that the PV device will be on the same network.
It does not make any PCI guarantees. The way Windows works is to find
the device based on "serial number" which is an Hyper-V specific attribute
of PCI devices.

I considered naming off of serial number but that won't work for the
case where PV device is present first and VF arrives later. The serial
number is attribute of VF, not the PV which is there first.

Your ideas about having the PCI information of the VF form the name
of the failover device have the same problem. The PV device may
be the only one present on boot.


> > On Azure, the VF maybe removed (by host) at any time and then later
> > reattached. There is no guarantee that VF will show back up at
> > the same synthetic PCI address. It will likely have a different
> > PCI domain value.  
> 
> This is something QEMU can do and make sure the PCI address is
> consistent after migration.
> 
> -Siwei