lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADGSJ22Fok88EY+vgU-i5HVsyJNdS5g5Oe1W3vvNy7_q+2_2HA@mail.gmail.com>
Date:   Fri, 8 Jun 2018 17:42:21 -0700
From:   Siwei Liu <loseweigh@...il.com>
To:     Stephen Hemminger <stephen@...workplumber.org>
Cc:     "Michael S. Tsirkin" <mst@...hat.com>,
        Jiri Pirko <jiri@...nulli.us>, kys@...rosoft.com,
        haiyangz@...rosoft.com, David Miller <davem@...emloft.net>,
        "Samudrala, Sridhar" <sridhar.samudrala@...el.com>,
        Netdev <netdev@...r.kernel.org>,
        Stephen Hemminger <sthemmin@...rosoft.com>
Subject: Re: [PATCH net] failover: eliminate callback hell

On Fri, Jun 8, 2018 at 5:02 PM, Stephen Hemminger
<stephen@...workplumber.org> wrote:
> On Fri, 8 Jun 2018 16:44:12 -0700
> Siwei Liu <loseweigh@...il.com> wrote:
>
>> On Fri, Jun 8, 2018 at 4:18 PM, Stephen Hemminger
>> <stephen@...workplumber.org> wrote:
>> > On Fri, 8 Jun 2018 15:25:59 -0700
>> > Siwei Liu <loseweigh@...il.com> wrote:
>> >
>> >> On Wed, Jun 6, 2018 at 2:24 PM, Stephen Hemminger
>> >> <stephen@...workplumber.org> wrote:
>> >> > On Wed, 6 Jun 2018 15:30:27 +0300
>> >> > "Michael S. Tsirkin" <mst@...hat.com> wrote:
>> >> >
>> >> >> On Wed, Jun 06, 2018 at 09:25:12AM +0200, Jiri Pirko wrote:
>> >> >> > Tue, Jun 05, 2018 at 05:42:31AM CEST, stephen@...workplumber.org wrote:
>> >> >> > >The net failover should be a simple library, not a virtual
>> >> >> > >object with function callbacks (see callback hell).
>> >> >> >
>> >> >> > Why just a library? It should do a common things. I think it should be a
>> >> >> > virtual object. Looks like your patch again splits the common
>> >> >> > functionality into multiple drivers. That is kind of backwards attitude.
>> >> >> > I don't get it. We should rather focus on fixing the mess the
>> >> >> > introduction of netvsc-bonding caused and switch netvsc to 3-netdev
>> >> >> > model.
>> >> >>
>> >> >> So it seems that at least one benefit for netvsc would be better
>> >> >> handling of renames.
>> >> >>
>> >> >> Question is how can this change to 3-netdev happen?  Stephen is
>> >> >> concerned about risk of breaking some userspace.
>> >> >>
>> >> >> Stephen, this seems to be the usecase that IFF_HIDDEN was trying to
>> >> >> address, and you said then "why not use existing network namespaces
>> >> >> rather than inventing a new abstraction". So how about it then? Do you
>> >> >> want to find a way to use namespaces to hide the PV device for netvsc
>> >> >> compatibility?
>> >> >>
>> >> >
>> >> > Netvsc can't work with 3 dev model. MS has worked with enough distro's and
>> >> > startups that all demand eth0 always be present. And VF may come and go.
>> >> > After this history, there is a strong motivation not to change how kernel
>> >> > behaves. Switching to 3 device model would be perceived as breaking
>> >> > existing userspace.
>> >> >
>> >> > With virtio you can  work it out with the distro's yourself.
>> >> > There is no pre-existing semantics to deal with.
>> >> >
>> >> > For the virtio, I don't see the need for IFF_HIDDEN.
>> >>
>> >> I have a somewhat different view regarding IFF_HIDDEN. The purpose of
>> >> that flag, as well as the 1-netdev model, is to have a means to
>> >> inherit the interface name from the VF, and to eliminate playing hacks
>> >> around renaming devices, customizing udev rules and et al. Why
>> >> inheriting VF's name important? To allow existing config/setup around
>> >> VF continues to work across kernel feature upgrade. Most of network
>> >> config files in all distros are based on interface names. Few are MAC
>> >> address based but making lower slaves hidden would cover the rest. And
>> >> most importantly, preserving the same level of user experience as
>> >> using raw VF interface once getting all ndo_ops and ethtool_ops
>> >> exposed. This is essential to realize transparent live migration that
>> >> users dont have to learn and be aware of the undertaken.
>> >
>> > Inheriting the VF name will fail in the migration scenario.
>> > It is perfectly reasonable to migrate a guest to another machine where
>> > the VF PCI address is different. And since current udev/systemd model
>> > is to base network device name off of PCI address, the device will change
>> > name when guest is migrated.
>> >
>> The scenario of having VF on a different PCI address on post migration
>> is essentially equal to plugging in a new NIC. Why it has to pair with
>> the original PV? A sepearte PV device should be in place to pair the
>> new VF.
>
> The host only guarantees that the PV device will be on the same network.
> It does not make any PCI guarantees. The way Windows works is to find
> the device based on "serial number" which is an Hyper-V specific attribute
> of PCI devices.
>
> I considered naming off of serial number but that won't work for the
> case where PV device is present first and VF arrives later. The serial
> number is attribute of VF, not the PV which is there first.

I assume the PV can get that information ahead of time before VF
arrives? Without it how do you match the device when you see a VF
coming with some serial number? Is it possible for PV to get the
matching SN even earlier during probe time? Or it has to depend on the
presence of vPCI bridge to generate this SN?

>
> Your ideas about having the PCI information of the VF form the name
> of the failover device have the same problem. The PV device may
> be the only one present on boot.

Yeah, this is a chicken-egg problem indeed, and that was the reason
why I supply the BDF info for PV to name the master interface.
However, the ACPI PCI slot needs to depend on the PCI bus enumeration
so that can't be predictable.  Would it make sense to only rename when
the first time a matching VF appears and PV interface isn't brought
up, then the failover master would always stick to the name
afterwards? I think it should cover most scenarios as it's usually
during boot time (dracut) the VF first appears and the PV interface at
the time then shouldn't have been configured yet.

-Siwei

>
>
>> > On Azure, the VF maybe removed (by host) at any time and then later
>> > reattached. There is no guarantee that VF will show back up at
>> > the same synthetic PCI address. It will likely have a different
>> > PCI domain value.
>>
>> This is something QEMU can do and make sure the PCI address is
>> consistent after migration.
>>
>> -Siwei
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ