[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201218000802.GV552508@nvidia.com>
Date: Thu, 17 Dec 2020 20:08:02 -0400
From: Jason Gunthorpe <jgg@...dia.com>
To: Alexander Duyck <alexander.duyck@...il.com>
CC: Saeed Mahameed <saeed@...nel.org>,
"David S. Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>,
Leon Romanovsky <leonro@...dia.com>,
Netdev <netdev@...r.kernel.org>, <linux-rdma@...r.kernel.org>,
David Ahern <dsahern@...nel.org>,
Jacob Keller <jacob.e.keller@...el.com>,
Sridhar Samudrala <sridhar.samudrala@...el.com>,
"Ertman, David M" <david.m.ertman@...el.com>,
Dan Williams <dan.j.williams@...el.com>,
Kiran Patil <kiran.patil@...el.com>,
Greg KH <gregkh@...uxfoundation.org>
Subject: Re: [net-next v4 00/15] Add mlx5 subfunction support
On Thu, Dec 17, 2020 at 01:05:03PM -0800, Alexander Duyck wrote:
> > I view the SW bypass path you are talking about similarly to
> > GSO/etc. It should be accessed by the HW driver as an optional service
> > provided by the core netdev, not implemented as some wrapper netdev
> > around a HW implementation.
>
> I view it as being something that would be a part of the switchdev API
> itself. Basically the switchev and endpoint would need to be able to
> control something like this because if XDP were enabled on one end or
> the other you would need to be able to switch it off so that all of
> the packets followed the same flow and could be scanned by the XDP
> program.
To me that still all comes down to being something like an optional
offload that the HW driver can trigger if the conditions are met.
> > It is simple enough, the HW driver's tx path would somehow detect
> > east/west and queue it differently, and the rx path would somehow be
> > able to mux in skbs from a SW queue. Not seeing any blockers here.
>
> In my mind the simple proof of concept for this would be to check for
> the multicast bit being set in the destination MAC address for packets
> coming from the subfunction. If it is then shunt to this bypass route,
> and if not then you transmit to the hardware queues.
Sure, not sure multicast optimization like this isn't incredibly niche
too, but it would be an interesting path to explore.
But again, there is nothing fundamental about the model here that
precludes this optional optimization.
> > Even if that is true, I don't belive for a second that adding a
> > different HW abstraction layer is going to somehow undo the mistakes
> > of the last 20 years.
>
> It depends on how it is done. The general idea is to address the
> biggest limitation that has occured, which is the fact that in many
> cases we don't have software offloads to take care of things when the
> hardware offloads provided by a certain piece of hardware are not
> present.
This is really disappointing to hear. Admittedly I don't follow all
the twists and turns on the mailing list, but I thought having a SW
version of everything was one of the fundamental tenants of netdev
that truly distinguished it from something like RDMA.
> It would basically allow us to reset the feature set. If something
> cannot be offloaded in software in a reasonable way, it is not
> allowed to be present in the interface provided to a container.
> That way instead of having to do all the custom configuration in the
> container recipe it can be centralized to one container handling all
> of the switching and hardware configuration.
Well, you could start by blocking stuff without a SW fallback..
> There I disagree. Now I can agree that most of the series is about
> presenting the aux device and that part I am fine with. However when
> the aux device is a netdev and that netdev is being loaded into the
> same kernel as the switchdev port is where the red flags start flying,
> especially when we start talking about how it is the same as a VF.
Well, it happens for the same reason a VF can create a netdev,
stopping it would actually be more patches. As I said before, people
are already doing this model with VFs.
I can agree with some of our points, but this is not the series to
argue them. What you want is to start some new thread on optimizing
switchdev for the container user case.
> In my mind we are talking about how the switchdev will behave and it
> makes sense to see about defining if a east-west bypass makes sense
> and how it could be implemented, rather than saying we won't bother
> for now and potentially locking in the subfunction to virtual function
> equality.
At least for mlx5 SF == VF, that is a consequence of the HW. Any SW
bypass would need to be specially built in the mlx5 netdev running on
a VF/SF attached to a switchdev port.
I don't see anything about this part of the model that precludes ever
doing that, and I also don't see this optimization as being valuable
enough to block things "just to be sure"
> In my mind we need more than just the increased count to justify
> going to subfunctions, and I think being able to solve the east-west
> problem at least in terms of containers would be such a thing.
Increased count is pretty important for users with SRIOV.
Jason
Powered by blists - more mailing lists