netdev - Re: [summary] virtio network device failover writeup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <106db6b4-1cce-2769-031b-2dae7c0f0e28@oracle.com>
Date:   Tue, 19 Mar 2019 14:55:35 -0700
From:   si-wei liu <si-wei.liu@...cle.com>
To:     Liran Alon <liran.alon@...cle.com>,
        "Michael S. Tsirkin" <mst@...hat.com>
Cc:     Sridhar Samudrala <sridhar.samudrala@...el.com>,
        Alexander Duyck <alexander.duyck@...il.com>,
        Stephen Hemminger <stephen@...workplumber.org>,
        Jakub Kicinski <kubakici@...pl>, Jiri Pirko <jiri@...nulli.us>,
        David Miller <davem@...emloft.net>,
        Netdev <netdev@...r.kernel.org>,
        virtualization@...ts.linux-foundation.org,
        boris.ostrovsky@...cle.com, vijay.balakrishna@...cle.com,
        jfreimann@...hat.com, ogerlitz@...lanox.com, vuhuong@...lanox.com
Subject: Re: [summary] virtio network device failover writeup



On 3/19/2019 5:38 AM, Liran Alon wrote:
> Hi Michael,
>
> Great blog-post which summarise everything very well!
>
> Some comments I have:
>
> 1) I think that when we are using the term “1-netdev model” on community discussion, we tend to refer to what you have defined in blog-post as "3-device model with hidden slaves”.
> Therefore, I would suggest to just remove the “1-netdev model” section and rename the "3-device model with hidden slaves” section to “1-netdev model”.
>
> 2) The userspace issues result both from using “2-netdev model” and “3-netdev model”. However, they are described in blog-post as they only exist on “3-netdev model”.
> The reason these issues are not seen in Azure environment is because these issues were partially handled by Microsoft for their specific 2-netdev model.
> Which leads me to the next comment.
>
> 3) I suggest that blog-post will also elaborate on what exactly are the userspace issues which results in models different than “1-netdev model”.
> The issues that I’m aware of are (Please tell me if you are aware of others!):
> (a) udev rename race-condition: When net-failover device is opened, it also opens it's slaves. However, the order of events to udev on KOBJ_ADD is first for the net-failover netdev and only then for the virtio-net netdev. This means that if userspace will respond to first event by open the net-failover, then any attempt of userspace to rename virtio-net netdev as a response to the second event will fail because the virtio-net netdev is already opened. Also note that this udev rename rule is useful because we would like to add rules that renames virtio-net netdev to clearly signal that it’s used as the standby interface of another net-failover netdev.
> The way this problem was workaround by Microsoft in NetVSC is to delay the open done on slave-VF from the open of the NetVSC netdev. However, this is still a race and thus a hacky solution. It was accepted by community only because it’s internal to the NetVSC driver. However, similar solution was rejected by community for the net-failover driver.
> The solution that we currently proposed to address this (Patch by Si-Wei) was to change the rename kernel handling to allow a net-failover slave to be renamed even if it is already opened. Patch is still not accepted.
> (b) Issues caused because of various userspace components DHCP the net-failover slaves: DHCP of course should only be done on the net-failover netdev. Attempting to DHCP on net-failover slaves as-well will cause networking issues. Therefore, userspace components should be taught to avoid doing DHCP on the net-failover slaves. The various userspace components include:
> b.1) dhclient: If run without parameters, it by default just enum all netdevs and attempt to DHCP them all.
> (I don’t think Microsoft has handled this)
> b.2) initramfs / dracut: In order to mount the root file-system from iSCSI, these components needs networking and therefore DHCP on all netdevs.
> (Microsoft haven’t handled (b.2) because they don’t have images which perform iSCSI boot in their Azure setup. Still an open issue)
> b.3) cloud-init: If configured to perform network-configuration, it attempts to configure all available netdevs. It should avoid however doing so on net-failover slaves.
> (Microsoft has handled this by adding a mechanism in cloud-init to blacklist a netdev from being configured in case it is owned by a specific PCI driver. Specifically, they blacklist Mellanox VF driver. However, this technique doesn’t work for the net-failover mechanism because both the net-failover netdev and the virtio-net netdev are owned by the virtio-net PCI driver).
> b.4) Various distros network-manager need to be updated to avoid DHCP on net-failover slaves? (Not sure. Asking...)
Add one additional issue that was just uncovered:
b.5) netplan: 3-netdev confused Ubuntu's netplan tool which dynamically 
generates udev rules in /run/udev/rules.d on the fly that matches netdev 
by MAC address only. I will file an enhancement request on launchpad later.

-Siwei

>
> 4) Another interesting use-case where the net-failover mechanism is useful is for handling NIC firmware failures or NIC firmware Live-Upgrade.
> In both cases, there is a need to perform a full PCIe reset of the NIC. Which lose all the NIC eSwitch configuration of the various VFs.
> To handle these cases gracefully, one could just hot-unplug all VFs from guests running on host (which will make all guests now use the virtio-net netdev which is backed by a netdev that eventually is on top of PF). Therefore, networking will be restored to guests once the PCIe reset is completed and the PF is functional again. To re-acceelrate the guests network, hypervisor can just hot-plug new VFs to guests.
>
> P.S:
> I would very appreciate all this forum help in closing on the pending items written in (3). Which currently prevents using this net-failover mechanism in real production use-cases.
>
> Regards,
> -Liran
>
>> On 17 Mar 2019, at 15:55, Michael S. Tsirkin <mst@...hat.com> wrote:
>>
>> Hi all,
>> I've put up a blog post with a summary of where network
>> device failover stands and some open issues.
>> Not sure where best to host it, I just put it up on blogspot:
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__mstsirkin.blogspot.com_2019_03_virtio-2Dnetwork-2Ddevice-2Dfailover-2Dsupport.html&d=DwIBAg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=Jk6Q8nNzkQ6LJ6g42qARkg6ryIDGQr-yKXPNGZbpTx0&m=jd0emHx6EkPSTvO0TytfYmG4rOMQ9htenhrgKprrh9E&s=5EJamlc_g1lZa0Ga7K30E6aWVg3jy8lizhw1aSguo3A&e=
>>
>> Comments, corrections are welcome!
>>
>> -- 
>> MST