lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADGSJ20sq8AmW34Z7w0JrTg6=YnKaM3Hq5R3TfhAH5BkuR2pqA@mail.gmail.com>
Date:   Fri, 8 Jun 2018 16:44:12 -0700
From:   Siwei Liu <loseweigh@...il.com>
To:     Stephen Hemminger <stephen@...workplumber.org>
Cc:     "Michael S. Tsirkin" <mst@...hat.com>,
        Jiri Pirko <jiri@...nulli.us>, kys@...rosoft.com,
        haiyangz@...rosoft.com, David Miller <davem@...emloft.net>,
        "Samudrala, Sridhar" <sridhar.samudrala@...el.com>,
        Netdev <netdev@...r.kernel.org>,
        Stephen Hemminger <sthemmin@...rosoft.com>
Subject: Re: [PATCH net] failover: eliminate callback hell

On Fri, Jun 8, 2018 at 4:18 PM, Stephen Hemminger
<stephen@...workplumber.org> wrote:
> On Fri, 8 Jun 2018 15:25:59 -0700
> Siwei Liu <loseweigh@...il.com> wrote:
>
>> On Wed, Jun 6, 2018 at 2:24 PM, Stephen Hemminger
>> <stephen@...workplumber.org> wrote:
>> > On Wed, 6 Jun 2018 15:30:27 +0300
>> > "Michael S. Tsirkin" <mst@...hat.com> wrote:
>> >
>> >> On Wed, Jun 06, 2018 at 09:25:12AM +0200, Jiri Pirko wrote:
>> >> > Tue, Jun 05, 2018 at 05:42:31AM CEST, stephen@...workplumber.org wrote:
>> >> > >The net failover should be a simple library, not a virtual
>> >> > >object with function callbacks (see callback hell).
>> >> >
>> >> > Why just a library? It should do a common things. I think it should be a
>> >> > virtual object. Looks like your patch again splits the common
>> >> > functionality into multiple drivers. That is kind of backwards attitude.
>> >> > I don't get it. We should rather focus on fixing the mess the
>> >> > introduction of netvsc-bonding caused and switch netvsc to 3-netdev
>> >> > model.
>> >>
>> >> So it seems that at least one benefit for netvsc would be better
>> >> handling of renames.
>> >>
>> >> Question is how can this change to 3-netdev happen?  Stephen is
>> >> concerned about risk of breaking some userspace.
>> >>
>> >> Stephen, this seems to be the usecase that IFF_HIDDEN was trying to
>> >> address, and you said then "why not use existing network namespaces
>> >> rather than inventing a new abstraction". So how about it then? Do you
>> >> want to find a way to use namespaces to hide the PV device for netvsc
>> >> compatibility?
>> >>
>> >
>> > Netvsc can't work with 3 dev model. MS has worked with enough distro's and
>> > startups that all demand eth0 always be present. And VF may come and go.
>> > After this history, there is a strong motivation not to change how kernel
>> > behaves. Switching to 3 device model would be perceived as breaking
>> > existing userspace.
>> >
>> > With virtio you can  work it out with the distro's yourself.
>> > There is no pre-existing semantics to deal with.
>> >
>> > For the virtio, I don't see the need for IFF_HIDDEN.
>>
>> I have a somewhat different view regarding IFF_HIDDEN. The purpose of
>> that flag, as well as the 1-netdev model, is to have a means to
>> inherit the interface name from the VF, and to eliminate playing hacks
>> around renaming devices, customizing udev rules and et al. Why
>> inheriting VF's name important? To allow existing config/setup around
>> VF continues to work across kernel feature upgrade. Most of network
>> config files in all distros are based on interface names. Few are MAC
>> address based but making lower slaves hidden would cover the rest. And
>> most importantly, preserving the same level of user experience as
>> using raw VF interface once getting all ndo_ops and ethtool_ops
>> exposed. This is essential to realize transparent live migration that
>> users dont have to learn and be aware of the undertaken.
>
> Inheriting the VF name will fail in the migration scenario.
> It is perfectly reasonable to migrate a guest to another machine where
> the VF PCI address is different. And since current udev/systemd model
> is to base network device name off of PCI address, the device will change
> name when guest is migrated.
>
The scenario of having VF on a different PCI address on post migration
is essentially equal to plugging in a new NIC. Why it has to pair with
the original PV? A sepearte PV device should be in place to pair the
new VF.


> On Azure, the VF maybe removed (by host) at any time and then later
> reattached. There is no guarantee that VF will show back up at
> the same synthetic PCI address. It will likely have a different
> PCI domain value.

This is something QEMU can do and make sure the PCI address is
consistent after migration.

-Siwei

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ