netdev - Re: [net-next v4 00/15] Add mlx5 subfunction support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201216030351.GH552508@nvidia.com>
Date:   Tue, 15 Dec 2020 23:03:51 -0400
From:   Jason Gunthorpe <jgg@...dia.com>
To:     Alexander Duyck <alexander.duyck@...il.com>
CC:     Saeed Mahameed <saeed@...nel.org>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Leon Romanovsky <leonro@...dia.com>,
        Netdev <netdev@...r.kernel.org>, <linux-rdma@...r.kernel.org>,
        David Ahern <dsahern@...nel.org>,
        Jacob Keller <jacob.e.keller@...el.com>,
        Sridhar Samudrala <sridhar.samudrala@...el.com>,
        "Ertman, David M" <david.m.ertman@...el.com>,
        Dan Williams <dan.j.williams@...el.com>,
        Kiran Patil <kiran.patil@...el.com>,
        Greg KH <gregkh@...uxfoundation.org>
Subject: Re: [net-next v4 00/15] Add mlx5 subfunction support

On Tue, Dec 15, 2020 at 06:19:18PM -0800, Alexander Duyck wrote:

> > > I would really like to see is a solid standardization of what this is.
> > > Otherwise the comparison is going to be made. Especially since a year
> > > ago Mellanox was pushing this as an mdev type interface.
> >
> > mdev was NAK'd too.
> >
> > mdev is only for creating /dev/vfio/*.
> 
> Agreed. However my worry is that as we start looking to make this
> support virtualization it will still end up swinging more toward
> mdev.

Of course. mdev is also the only way to create a /dev/vfio/* :)

So all paths that want to use vfio must end up creating a mdev.

Here we would choose to create the mdev on top of the SF aux device.
There isn't really anything mlx5 specific about that decision. 

The SF models the vendor specific ADI in the driver model.

> It isn't so much about right or wrong but he use cases. My experience
> has been that SR-IOV ends up being used for very niche use cases where
> you are direct assigning it into either DPDK or some NFV VM and you
> are essentially building the application around the NIC. It is all
> well and good, but for general virtualization it never really caught
> on.

Sure
 
> > So encourage other vendors to support the switchdev model for managing
> > VFs and ADIs!
> 
> Ugh, don't get me started on switchdev. The biggest issue as I see it
> with switchev is that you have to have a true switch in order to
> really be able to use it. 

That cuts both ways, suggesting HW with a true switch model itself
with VMDq is equally problematic.

> As such dumbed down hardware like the ixgbe for instance cannot use
> it since it defaults to outputting anything that doesn't have an
> existing rule to the external port. If we could tweak the design to
> allow for more dumbed down hardware it would probably be much easier
> to get wider adoption.

I'd agree with this

> interface, but keep the SF interface simple. Then you can back it with
> whatever you want, but without having to have a vendor specific
> version of the interface being plugged into the guest or container.

The entire point *is* to create the vendor version because that serves
the niche cases where SRIOV assignment is already being used.

Having a general solution that can't do vendor SRIOV is useful for
other application, but doesn't eliminate the need for the SRIOV case.

> One of the reasons why virtio-net is being pushed as a common
> interface for vendors is for this reason. It is an interface that can
> be emulated by software or hardware and it allows the guest to run on
> any arbitrary hardware.

Yes, and there is mlx5_vdpa to support this usecase, and it binds to
the SF. Of course all of that is vendor specific too, the driver to
convert HW specifc register programming into a virio-net ADI has to
live *somewhere*

> It has plenty to do with this series. This topic has been under
> discussion since something like 2017 when Mellanox first brought it up
> at Netdev 2.1. At the time I told them they should implement this as a
> veth offload. 

veth doesn't give an ADI, it is useless for these niche cases.

veth offload might be interesting for some container case, but feels
like writing an enormous amount of code to accomplish nothing new...

> Then it becomes obvious what the fallback becomes as you can place
> packets into one end of a veth and it comes out the other, just like
> a switchdev representor and the SF in this case. It would make much
> more sense to do it this way rather than setting up yet another
> vendor proprietary interface pair.

I agree it makes sense to have an all SW veth-like option, but I
wouldn't try to make that as the entry point for all the HW
acceleration or to serve the niche SRIOV use cases, or to represent an
ADI.

It just can't do that and it would make a huge mess if you tried to
force it. Didn't Intel already try this once with trying to use the
macvlan netdev and its queue offload to build an ADI?

> > Anyhow, if such a thing exists someday it could make sense to
> > automatically substitute the HW version using a SF, if available.
> 
> The main problem as I see it is the fact that the SF interface is
> bound too tightly to the hardware. 

That is goal here. This is not about creating just a netdev, this is
about the whole kit: rdma, netdev, vdpa virtio-net, virtio-mdev.

The SF has to support all of that completely. Focusing only on the
one use case of netdevs in containers misses the bigger picture. 

Yes, lots of this stuff is niche, but niche stuff needs to be
supported too.

> Yes, it is a standard feature set for the control plane. However for
> the data-path it is somewhat limited as I feel it only describes what
> goes through the switch.

Sure, I think that is its main point.

> Not the interfaces that are exposed as the endpoints. 

It came from modeling physical HW so the endports are 'physical'
things like actual HW switch ports, or SRIOV VFs, ADI, etc.

> It is the problem of that last bit and how it is handled that can
> make things ugly. For example the multicast/broadcast replication
> problem that just occurred to me while writing this up.  The fact is
> for east-west traffic there has always been a problem with the
> switchdev model as it limits everything to PCIe/DMA so there are
> cases where software switches can outperform the hardware ones.

Yes, but, mixing CPU and DMA in the same packet delivery scheme is
very complicated :)

Jason