[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180220195305.GE2031@nanopsycho>
Date: Tue, 20 Feb 2018 20:53:05 +0100
From: Jiri Pirko <jiri@...nulli.us>
To: Alexander Duyck <alexander.duyck@...il.com>
Cc: Sridhar Samudrala <sridhar.samudrala@...el.com>,
"Michael S. Tsirkin" <mst@...hat.com>,
Stephen Hemminger <stephen@...workplumber.org>,
David Miller <davem@...emloft.net>,
Netdev <netdev@...r.kernel.org>,
virtualization@...ts.linux-foundation.org,
virtio-dev@...ts.oasis-open.org,
"Brandeburg, Jesse" <jesse.brandeburg@...el.com>,
"Duyck, Alexander H" <alexander.h.duyck@...el.com>,
Jakub Kicinski <kubakici@...pl>,
Jason Wang <jasowang@...hat.com>,
Siwei Liu <loseweigh@...il.com>
Subject: Re: [RFC PATCH v3 0/3] Enable virtio_net to act as a backup for a
passthru device
Tue, Feb 20, 2018 at 06:23:49PM CET, alexander.duyck@...il.com wrote:
>On Tue, Feb 20, 2018 at 8:29 AM, Jiri Pirko <jiri@...nulli.us> wrote:
>> Tue, Feb 20, 2018 at 05:04:29PM CET, alexander.duyck@...il.com wrote:
>>>On Tue, Feb 20, 2018 at 2:42 AM, Jiri Pirko <jiri@...nulli.us> wrote:
>>>> Fri, Feb 16, 2018 at 07:11:19PM CET, sridhar.samudrala@...el.com wrote:
>>>>>Patch 1 introduces a new feature bit VIRTIO_NET_F_BACKUP that can be
>>>>>used by hypervisor to indicate that virtio_net interface should act as
>>>>>a backup for another device with the same MAC address.
>>>>>
>>>>>Ppatch 2 is in response to the community request for a 3 netdev
>>>>>solution. However, it creates some issues we'll get into in a moment.
>>>>>It extends virtio_net to use alternate datapath when available and
>>>>>registered. When BACKUP feature is enabled, virtio_net driver creates
>>>>>an additional 'bypass' netdev that acts as a master device and controls
>>>>>2 slave devices. The original virtio_net netdev is registered as
>>>>>'backup' netdev and a passthru/vf device with the same MAC gets
>>>>>registered as 'active' netdev. Both 'bypass' and 'backup' netdevs are
>>>>>associated with the same 'pci' device. The user accesses the network
>>>>>interface via 'bypass' netdev. The 'bypass' netdev chooses 'active' netdev
>>>>>as default for transmits when it is available with link up and running.
>>>>
>>>> Sorry, but this is ridiculous. You are apparently re-implemeting part
>>>> of bonding driver as a part of NIC driver. Bond and team drivers
>>>> are mature solutions, well tested, broadly used, with lots of issues
>>>> resolved in the past. What you try to introduce is a weird shortcut
>>>> that already has couple of issues as you mentioned and will certanly
>>>> have many more. Also, I'm pretty sure that in future, someone comes up
>>>> with ideas like multiple VFs, LACP and similar bonding things.
>>>
>>>The problem with the bond and team drivers is they are too large and
>>>have too many interfaces available for configuration so as a result
>>>they can really screw this interface up.
>>
>> What? Too large is which sense? Why "too many interfaces" is a problem?
>> Also, team has only one interface to userspace team-generic-netlink.
>
>Specifically I was working with bond. I had overlooked team for the
>most part since it required an additional userspace daemon which
>basically broke our requirement of no user-space intervention.
Why? That sound artificial. Why the userspace cannot be part of the
solution?
>
>I was trying to focus on just doing an active/backup setup. The
>problem is there are debugfs, sysfs, and procfs interfaces exposed
>that we don't need and/or want. Adding any sort of interface to
>exclude these would just bloat up the bonding driver, and leaving them
>in would just be confusing since they would all need to be ignored. In
>addition the steps needed to get the name to come out the same as the
>original virtio interface would just bloat up bonding.
Why to you care about "name"? it's a netdev, isn't it all that matters?
The viewpoint of the user inside vm boils down to:
1) I have 2 netdevs
2) One is preferred
3) I setup team on top of them
That's should be it. It is the users responsibility to do it this way.
>
>>>
>>>Essentially this is meant to be a bond that is more-or-less managed by
>>>the host, not the guest. We want the host to be able to configure it
>>
>> How is it managed by the host? In your usecase the guest has 2 netdevs:
>> virtio_net, pci vf.
>> I don't see how host can do any managing of that, other than the
>> obvious. But still, the active/backup decision is done in guest. This is
>> a simple bond/team usecase. As I said, there is something needed to be
>> implemented in userspace in order to handle re-appear of vf netdev.
>> But that should be fairly easy to do in teamd.
>>
>>
>>>and have it automatically kick in on the guest. For now we want to
>>>avoid adding too much complexity as this is meant to be just the first
>>
>> That's what I fear, "for now"..
>
>I used the expression "for now" as I see this being the first stage of
>a multi-stage process.
That is what I fear...
>
>Step 1 is to get a basic virtio-bypass driver added to virtio so that
>it is at least comparable to netvsc in terms of feature set and
>enables basic network live migration.
>
>Step 2 is adding some sort of dirty page tracking, preferably via
>something like a paravirtual iommu interface. Once we have that we can
>defer the eviction of the VF until the very last moment of the live
>migration. For now I need to work on testing a modification to allow
>mapping the entire guest as being pass-through for DMA to the device,
>and requiring dynamic for any DMA that is bidirectional or from the
>device.
That is purely on the host side. Does not really matter if your solution
or standard bond/team is in use, right?
>
>Step 3 will be to start looking at advanced configuration. That is
>where we drop the implementation in step 1 and instead look at
>spawning something that looks more like the team type interface,
>however instead of working with a user-space daemon we would likely
>need to work with some sort of mailbox or message queue coming up from
>the hypervisor. Then we can start looking at doing things like passing
>up blocks of eBPF code to handle Tx port selection or whatever we
>need.
:O
>
>>
>>>step. Trying to go in and implement the whole solution right from the
>>>start based on existing drivers is going to be a massive time sink and
>>>will likely never get completed due to the fact that there is always
>>>going to be some other thing that will interfere.
>>
>> "implement the whole solution right from the start based on existing
>> drivers" - what solution are you talking about? I don't understand this
>> para.
>
>You started mentioning much more complex configurations such as
>multi-VF, LACP, and other such things. I fully own that this cannot
>support that. My understanding is that the netvsc solution that is out
>there cannot support anything like that either. The idea for now is to
>keep this as simple as possible. It makes things like the possibility
>of porting this to other OSes much easier.
Easier solution is team and teamd with linimal modifications in order to
make your usecase work. Btw, do you have the needs for your usecase
written down somewhere, so we are on the same page?
>
>>>
>>>My personal hope is that we can look at doing a virtio-bond sort of
>>>device that will handle all this as well as providing a communication
>>>channel, but that is much further down the road. For now we only have
>>>a single bit so the goal for now is trying to keep this as simple as
>>>possible.
>>
>> Oh. So there is really intention to do re-implementation of bonding
>> in virtio. That is plain-wrong in my opinion.
>>
>> Could you just use bond/team, please, and don't reinvent the wheel with
>> this abomination?
>
>So I have a question for you. Why did you create the team driver? The
>bonding code was already there and does almost exactly the same thing.
Please do go down the git log memory lane. Team was introduced in 2011.
At that time bonding was not in a good shape. I decided to rewrite it
with minimal parts being in kernel to allow the flexibility user needs
to be done in userspace. By the way, the usecase you are trying to
resolve by this patchset is something that can benefit from the team
driver kernel-userspace architecture. Easily.
>I would think it has to do with where things are managed. That is the
>same situation we have with this.
>
>In my mind I don't see this something where we can just fit it into
>one of these two drivers because of the same reason the bonding and
>team drivers are split. We want to manage this interface somewhere
>else. In my mind what we probably need to do is look at refactoring
>the code since the control paths are in different locations for each
>of these drivers, but much of the datapath is the same. That is where
This is where you try to twist the universe, in my opinion. You want to
move the responsibilities of the user inside the guest to the user
ourside it. And you use this weird mechanisms to do so. It feel very
wrong. Look at it from non-virtualized point of view. It's like you
would have HW that would configure the kernel which runs on it. It is
supposed to be the other way around.
>I see things going eventually for this "virtio-bond" interface I
>referenced, but for now this interface is not that since there isn't
>really any communication channel present at all.
That is exactly what I fear of. Thanks for ensuring me that is the final
vision :/
>
>>>
>>>> What is the reason for this abomination? According to:
>>>> https://marc.info/?l=linux-virtualization&m=151189725224231&w=2
>>>> The reason is quite weak.
>>>> User in the vm sees 2 (or more) netdevices, he puts them in bond/team
>>>> and that's it. This works now! If the vm lacks some userspace features,
>>>> let's fix it there! For example the MAC changes is something that could
>>>> be easily handled in teamd userspace deamon.
>>>
>>>I think you might have missed the point of this. This is meant to be a
>>>simple interface so the guest should not be able to change the MAC
>>>address, and it shouldn't require any userspace daemon to setup or
>>>tear down. Ideally with this solution the virtio bypass will come up
>>>and be assigned the name of the original virtio, and the "backup"
>>>interface will come up and be assigned the name of the original virtio
>>>with an additional "nbackup" tacked on via the phys_port_name, and
>>>then whenever a VF is added it will automatically be enslaved by the
>>>bypass interface, and it will be removed when the VF is hotplugged
>>>out.
>>>
>>>In my mind the difference between this and bond or team is where the
>>>configuration interface lies. In the case of bond it is in the kernel.
>>>If my understanding is correct team is mostly in user space. With this
>>>the configuration interface is really down in the hypervisor and
>>>requests are communicated up to the guest. I would prefer not to make
>>>virtio_net dependent on the bonding or team drivers, or worse yet a
>>>userspace daemon in the guest. For now I would argue we should keep
>>>this as simple as possible just to support basic live migration. There
>>>has already been discussions of refactoring this after it is in so
>>>that we can start to combine the functionality here with what is there
>>>in bonding/team, but the differences in configuration interface and
>>>the size of the code bases will make it challenging to outright merge
>>>this into something like that.
Powered by blists - more mailing lists