[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201216133309.GI552508@nvidia.com>
Date: Wed, 16 Dec 2020 09:33:09 -0400
From: Jason Gunthorpe <jgg@...dia.com>
To: Alexander Duyck <alexander.duyck@...il.com>
CC: Saeed Mahameed <saeed@...nel.org>,
"David S. Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>,
Leon Romanovsky <leonro@...dia.com>,
Netdev <netdev@...r.kernel.org>, <linux-rdma@...r.kernel.org>,
David Ahern <dsahern@...nel.org>,
Jacob Keller <jacob.e.keller@...el.com>,
Sridhar Samudrala <sridhar.samudrala@...el.com>,
"Ertman, David M" <david.m.ertman@...el.com>,
Dan Williams <dan.j.williams@...el.com>,
Kiran Patil <kiran.patil@...el.com>,
Greg KH <gregkh@...uxfoundation.org>
Subject: Re: [net-next v4 00/15] Add mlx5 subfunction support
On Tue, Dec 15, 2020 at 08:13:21PM -0800, Alexander Duyck wrote:
> > > Ugh, don't get me started on switchdev. The biggest issue as I see it
> > > with switchev is that you have to have a true switch in order to
> > > really be able to use it.
> >
> > That cuts both ways, suggesting HW with a true switch model itself
> > with VMDq is equally problematic.
>
> Yes and no. For example the macvlan offload I had setup could be
> configured both ways and it made use of VMDq. I'm not necessarily
> arguing that we need to do VMDq here, however at the same time saying
> that this is only meant to replace SR-IOV becomes problematic since we
> already have SR-IOV so why replace it with something that has many of
> the same limitations?
Why? Because SR-IOV is the *only* option for many use cases. Still. I
said this already, something more generic does not magicaly eliminate
SR-IOV.
The SIOV ADI model is a small refinement to the existing VF scheme, it
is completely parallel to making more generic things.
It is not "repeating mistakes" it is accepting the limitations of
SR-IOV because benefits exist and applications need those benefits.
> That said I understand your argument, however I view the elimination
> of SR-IOV to be something we do after we get this interface right and
> can justify doing so.
Elimination of SR-IOV isn't even a goal here!
> Also it might be useful to call out the flavours and planned flavours
> in the cover page. Admittedly the description is somewhat lacking in
> that regard.
This is more of a general switchdev remark though. In the swithdev
model you have a the switch and a switch port. Each port has a
swichdev representor on the switch side and a "user port" of some
kind.
It can be a physical thing:
- SFP
- QSFP
- WiFi Antennae
It could be a semi-physical thing outside the view of the kernel:
- SmartNIC VF/SF attached to another CPU
It can be a semi-physical thing in view of this kernel:
- SRIOV VF (struct pci device)
- SF (struct aux device)
It could be a SW construct in this kernel:
- netdev (struct net device)
*all* of these different port types are needed. Probably more down the
road!
Notice I don't have VPDA, VF/SF netdev, or virtio-mdev as a "user
port" type here. Instead creating the user port pci or aux device
allows the user to use the Linux driver model to control what happens
to the pci/aux device next.
> I would argue that is one of the reasons why this keeps being
> compared to either VMDq or VMQ as it is something that SR-IOV has
> yet to fully replace and has many features that would be useful in
> an interface that is a subpartition of an existing interface.
In what sense does switchdev and a VF not fully replace macvlan VMDq?
> The Intel drivers still have the macvlan as the assignable ADI and
> make use of VMDq to enable it.
Is this in-tree or only in the proprietary driver? AFAIK there is no
in-tree way to extract the DMA queue from the macvlan netdev into
userspace..
Remeber all this VF/SF/VDPA stuff results in a HW dataplane, not a SW
one. It doesn't really make sense to compare a SW dataplane to a HW
one. HW dataplanes come with limitations and require special driver
code.
> The limitation as I see it is that the macvlan interface doesn't allow
> for much in the way of custom offloads and the Intel hardware doesn't
> support switchdev. As such it is good for a basic interface, but
> doesn't really do well in terms of supporting advanced vendor-specific
> features.
I don't know what it is that prevents Intel from modeling their
selector HW in switchdev, but I think it is on them to work with the
switchdev folks to figure something out.
I'm a bit surprised HW that can do macvlan can't be modeled with
switchdev? What is missing?
> > That is goal here. This is not about creating just a netdev, this is
> > about the whole kit: rdma, netdev, vdpa virtio-net, virtio-mdev.
>
> One issue is right now we are only seeing the rdma and netdev. It is
> kind of backwards as it is using the ADIs on the host when this was
> really meant to be used for things like mdev.
This is second 15 patch series on this path already. It is not
possible to pack every single thing into this series. This is the
micro step of introducing the SF idea and using SF==VF to show how the
driver stack works. The minimal changing to the existing drivers
implies this can support an ADI as well.
Further, this does already show an ADI! vdpa_mlx5 will run on the
VF/SF and eventually causes qemu to build a virtio-net ADI that
directly passes HW DMA rings into the guest.
Isn't this exactly the kind of generic SRIOV replacement option you
have been asking for? Doesn't this completely supersede stuff built on
macvlan?
> expected to work. The swtichdev API puts some restrictions in place
> but there still ends up being parts without any definition.
I'm curious what you see as needing definition here?
The SRIOV model has the HW register programming API is device
specific.
The switchdev model is: no matter what HW register programing is done
on the VF/SF all the packets tx/rx'd will flow through the switchdev.
The purpose of switchdev/SRIOV/SIOV has never been to define a single
"one register set to rule them all".
That is the area that VDPA virtio-net and others are covering.
Jason
Powered by blists - more mailing lists