[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180221193832.GE1996@nanopsycho>
Date: Wed, 21 Feb 2018 20:38:32 +0100
From: Jiri Pirko <jiri@...nulli.us>
To: Alexander Duyck <alexander.duyck@...il.com>
Cc: Jakub Kicinski <kubakici@...pl>,
"Samudrala, Sridhar" <sridhar.samudrala@...el.com>,
"Michael S. Tsirkin" <mst@...hat.com>,
Stephen Hemminger <stephen@...workplumber.org>,
David Miller <davem@...emloft.net>,
Netdev <netdev@...r.kernel.org>,
virtualization@...ts.linux-foundation.org,
virtio-dev@...ts.oasis-open.org,
"Brandeburg, Jesse" <jesse.brandeburg@...el.com>,
"Duyck, Alexander H" <alexander.h.duyck@...el.com>,
Jason Wang <jasowang@...hat.com>,
Siwei Liu <loseweigh@...il.com>
Subject: Re: [RFC PATCH v3 0/3] Enable virtio_net to act as a backup for a
passthru device
Wed, Feb 21, 2018 at 06:56:35PM CET, alexander.duyck@...il.com wrote:
>On Wed, Feb 21, 2018 at 8:58 AM, Jiri Pirko <jiri@...nulli.us> wrote:
>> Wed, Feb 21, 2018 at 05:49:49PM CET, alexander.duyck@...il.com wrote:
>>>On Wed, Feb 21, 2018 at 8:11 AM, Jiri Pirko <jiri@...nulli.us> wrote:
>>>> Wed, Feb 21, 2018 at 04:56:48PM CET, alexander.duyck@...il.com wrote:
>>>>>On Wed, Feb 21, 2018 at 1:51 AM, Jiri Pirko <jiri@...nulli.us> wrote:
>>>>>> Tue, Feb 20, 2018 at 11:33:56PM CET, kubakici@...pl wrote:
>>>>>>>On Tue, 20 Feb 2018 21:14:10 +0100, Jiri Pirko wrote:
>>>>>>>> Yeah, I can see it now :( I guess that the ship has sailed and we are
>>>>>>>> stuck with this ugly thing forever...
>>>>>>>>
>>>>>>>> Could you at least make some common code that is shared in between
>>>>>>>> netvsc and virtio_net so this is handled in exacly the same way in both?
>>>>>>>
>>>>>>>IMHO netvsc is a vendor specific driver which made a mistake on what
>>>>>>>behaviour it provides (or tried to align itself with Windows SR-IOV).
>>>>>>>Let's not make a far, far more commonly deployed and important driver
>>>>>>>(virtio) bug-compatible with netvsc.
>>>>>>
>>>>>> Yeah. netvsc solution is a dangerous precedent here and in my opinition
>>>>>> it was a huge mistake to merge it. I personally would vote to unmerge it
>>>>>> and make the solution based on team/bond.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>To Jiri's initial comments, I feel the same way, in fact I've talked to
>>>>>>>the NetworkManager guys to get auto-bonding based on MACs handled in
>>>>>>>user space. I think it may very well get done in next versions of NM,
>>>>>>>but isn't done yet. Stephen also raised the point that not everybody is
>>>>>>>using NM.
>>>>>>
>>>>>> Can be done in NM, networkd or other network management tools.
>>>>>> Even easier to do this in teamd and let them all benefit.
>>>>>>
>>>>>> Actually, I took a stab to implement this in teamd. Took me like an hour
>>>>>> and half.
>>>>>>
>>>>>> You can just run teamd with config option "kidnap" like this:
>>>>>> # teamd/teamd -c '{"kidnap": true }'
>>>>>>
>>>>>> Whenever teamd sees another netdev to appear with the same mac as his,
>>>>>> or whenever teamd sees another netdev to change mac to his,
>>>>>> it enslaves it.
>>>>>>
>>>>>> Here's the patch (quick and dirty):
>>>>>>
>>>>>> Subject: [patch teamd] teamd: introduce kidnap feature
>>>>>>
>>>>>> Signed-off-by: Jiri Pirko <jiri@...lanox.com>
>>>>>
>>>>>So this doesn't really address the original problem we were trying to
>>>>>solve. You asked earlier why the netdev name mattered and it mostly
>>>>>has to do with configuration. Specifically what our patch is
>>>>>attempting to resolve is the issue of how to allow a cloud provider to
>>>>>upgrade their customer to SR-IOV support and live migration without
>>>>>requiring them to reconfigure their guest. So the general idea with
>>>>>our patch is to take a VM that is running with virtio_net only and
>>>>>allow it to instead spawn a virtio_bypass master using the same netdev
>>>>>name as the original virtio, and then have the virtio_net and VF come
>>>>>up and be enslaved by the bypass interface. Doing it this way we can
>>>>>allow for multi-vendor SR-IOV live migration support using a guest
>>>>>that was originally configured for virtio only.
>>>>>
>>>>>The problem with your solution is we already have teaming and bonding
>>>>>as you said. There is already a write-up from Red Hat on how to do it
>>>>>(https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.1/html/virtual_machine_management_guide/sect-migrating_virtual_machines_between_hosts).
>>>>>That is all well and good as long as you are willing to keep around
>>>>>two VM images, one for virtio, and one for SR-IOV with live migration.
>>>>
>>>> You don't need 2 images. You need only one. The one with the team setup.
>>>> That's it. If another netdev with the same mac appears, teamd will
>>>> enslave it and run traffic on it. If not, ok, you'll go only through
>>>> virtio_net.
>>>
>>>Isn't that going to cause the routing table to get messed up when we
>>>rearrange the netdevs? We don't want to have an significant disruption
>>> in traffic when we are adding/removing the VF. It seems like we would
>>>need to invalidate any entries that were configured for the virtio_net
>>>and reestablish them on the new team interface. Part of the criteria
>>>we have been working with is that we should be able to transition from
>>>having a VF to not or vice versa without seeing any significant
>>>disruption in the traffic.
>>
>> What? You have routes on the team netdev. virtio_net and VF are only
>> slaves. What are you talking about? I don't get it :/
>
>So lets walk though this by example. The general idea of the base case
>for all this is somebody starting with virtio_net, we will call the
>interface "ens1" for now. It comes up and is assigned a dhcp address
>and everything works as expected. Now in order to get better
>performance we want to add a VF "ens2", but we don't want a new IP
>address. Now if I understand correctly what will happen is that when
>"ens2" appears on the system teamd will then create a new team
>interface "team0". Before teamd can enslave ens1 it has to down the
No, you don't understand that correctly.
There is always ens1 and team0. ens1 is a slave of team0. team0 is the
interface to use, to set ip on etc.
When ens2 appears, it gets enslaved to team0 as well.
>interface if I understand things correctly. This means that we have to
>disrupt network traffic in order for this to work.
>
>To give you an idea of where we were before this became about trying
>to do this in the team or bonding driver, we were debating a 2 netdev
>model versus a 3 netdev model. I will call out the model and the
>advantages/disadvantages of those below.
>
>2 Netdev model, "ens1", enslaves "ens2".
>- Requires dropping in-driver XDP in order to work (won't capture VF
>traffic otherwise)
>- VF takes performance hit for extra qdisc/Tx queue lock of virtio_net interface
>- If you ass-u-me (I haven't been a fan of this model if you can't
>tell) that it is okay to rip out in-driver XDP from virtio_net, then
>you could transition between base virtio, virtio w/ backup bit set.
>- Works for netvsc because they limit their features (no in-driver
>XDP) to guarantee this works.
>
>3 Netdev model, "ens1", enslaves "ens1nbackup" and "ens2"
>- Exposes 2 netdevs "ens1" and "ens1nbackup" when only virtio is present
>- No extra qdisc or locking
>- All virtio_net original functionality still present
>- Not able to transition from virtio to virtio w/ backup without
>disruption (requires hot-plug)
>
>The way I see it the only way your team setup could work would be
>something closer to the 3 netdev model. Basically we would be
>requiring the user to always have the team0 present in order to make
>certain that anything like XDP would be run on the team interface
>instead of assuming that the virtio_net could run by itself. I will
>add it as a third option here to compare to the other 2.
Yes.
>
>3 Netdev "team" model, "team0", enslaves "ens1" and "ens2"
>- Requires guest to configure teamd
>- Exposes "team0" and "ens1" when only virtio is present
>- No extra qdisc or locking
>- Doesn't require "backup" bit in virtio
>
>>>
>>>Also how does this handle any static configuration? I am assuming that
>>>everything here assumes the team will be brought up as soon as it is
>>>seen and assigned a DHCP address.
>>
>> Again. You configure whatever you need on the team netdev.
>
>Just so we are clear, are you then saying that the team0 interface
>will always be present with this configuration? You had made it sound
Of course.
>like it would disappear if you didn't have at least 2 interfaces.
Where did I make it sound like that? No.
>
>>>
>>>The solution as you have proposed seems problematic at best. I don't
>>>see how the team solution works without introducing some sort of
>>>traffic disruption to either add/remove the VF and bring up/tear down
>>>the team interface. At that point we might as well just give up on
>>>this piece of live migration support entirely since the disruption was
>>>what we were trying to avoid. We might as well just hotplug out the VF
>>>and hotplug in a virtio at the same bus device and function number and
>>>just let udev take care of renaming it for us. The idea was supposed
>>>to be a seamless transition between the two interfaces.
>>
>> Alex. What you are trying to do in this patchset and what netvsc does it
>> essentialy in-driver bonding. Same thing mechanism, rx_handler,
>> everything. I don't really understand what are you talking about. With
>> use of team you will get exactly the same behaviour.
>
>So the goal of the "in-driver bonding" is to make the bonding as
>non-intrusive as possible and require as little user intervention as
>possible. I agree that much of the handling is the same, however the
>control structure and requirements are significantly different. That
>has been what I have been trying to explain. You keep wanting to use
>the existing structures, but they don't really apply cleanly because
>they push control for the interface up into the guest, and that
>doesn't make much sense in the case of virtualization. What is
>happening here is that we are exposing a bond that the guest should
>have no control over, or at least as little as possible. In addition
>making the user have to add additional configuration in the guest
>means that there is that much more that can go wrong if they screw it
>up.
>
>The other problem here is that the transition needs to be as seamless
>as possible between just a standard virtio_net setup and this new
>setup. With either the team or bonding setup you end up essentially
>forcing the guest to have the bond/team always there even if they are
>running only a single interface. Only if they "upgrade" the VM by
>adding a VF then it finally gets to do anything.
Yeah. There is certainly a dilemma. We have to choose between
1) weird and hackish in-driver semi-bonding that would be simple
for user.
2) the standard way that would be perhaps slighly more complicated
for user.
>
>What this comes down to for us is the following requirements:
>1. The name of the interface cannot change when going from virtio_net,
>to virtio_net being bypassed using a VF. We cannot create an interface
>on top of the interface, if anything we need to push the original
>virtio_net out of the way so that the new team interface takes its
>place in the configuration of the system. Otherwise a VM with VF w/
>live migration will require a different configuration than one that
>just runs virtio_net.
Team driver netdev is still the same, no name changes.
>2. We need some way to signal if this VM should be running in an
>"upgraded" mode or not. We have been using the backup bit in
>virtio_net to do that. If it isn't "upgraded" then we don't need the
>team/bond and we can just run with virtio_net.
I don't see why the team cannot be there always.
>3. We cannot introduce any downtime on the interface when adding a VF
>or removing it. The link must stay up the entire time and be able to
>handle packets.
Sure. That should be handled by the team. Whenever the VF netdev
disappears, traffic would switch over to the virtio_net. The benefit of
your in-driver bonding solution is that qemu can actually signal the
guest driver that the disappearance would happen and do the switch a bit
earlier. But that is something that might be implemented in a different
channel where the kernel might get notification that certain pci is
going to disappear so everyone could prepare. Just an idea.
Powered by blists - more mailing lists