[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <AM6PR05MB51424FC005A5454E3638E77CC5710@AM6PR05MB5142.eurprd05.prod.outlook.com>
Date: Thu, 14 Nov 2019 16:47:43 +0000
From: Yuval Avnery <yuvalav@...lanox.com>
To: Parav Pandit <parav@...lanox.com>,
Andy Gospodarek <andrew.gospodarek@...adcom.com>
CC: Jakub Kicinski <jakub.kicinski@...ronome.com>,
Jiri Pirko <jiri@...lanox.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
Saeed Mahameed <saeedm@...lanox.com>,
"leon@...nel.org" <leon@...nel.org>,
"davem@...emloft.net" <davem@...emloft.net>,
"shuah@...nel.org" <shuah@...nel.org>,
Daniel Jurgens <danielj@...lanox.com>,
"michael.chan@...adcom.com" <michael.chan@...adcom.com>
Subject: RE: [PATCH net-next v2 00/10] devlink subdev
Andy,
Jakub,
Thank you for your comments
I will send another version with the following enhancements:
- Support multiple hw_addr.
- Describe more future use cases.
- Describe mdevs creation with devlink subdev. (this can be used for VFs, once PCIe spec supports)
> -----Original Message-----
> From: Parav Pandit
> Sent: Tuesday, November 12, 2019 10:37 PM
> To: Andy Gospodarek <andrew.gospodarek@...adcom.com>; Yuval Avnery
> <yuvalav@...lanox.com>
> Cc: Jakub Kicinski <jakub.kicinski@...ronome.com>; Jiri Pirko
> <jiri@...lanox.com>; netdev@...r.kernel.org; Saeed Mahameed
> <saeedm@...lanox.com>; leon@...nel.org; davem@...emloft.net;
> shuah@...nel.org; Daniel Jurgens <danielj@...lanox.com>;
> michael.chan@...adcom.com
> Subject: RE: [PATCH net-next v2 00/10] devlink subdev
>
> Hi Andy,
>
> > From: Andy Gospodarek <andrew.gospodarek@...adcom.com>
> > Sent: Tuesday, November 12, 2019 8:56 AM On Mon, Nov 11, 2019 at
> > 06:46:52PM +0000, Yuval Avnery wrote:
> > >
> > > >
> [..]
>
> >
> > It feels a bit 'unconstrained' to me as well. As Jakub said you added
> > some documentation, but the direction of this long-term is not clear.
>
> I am interested to know if the long-term direction of the devlink was clear
> enough in cover letter [1], that devlink device will change net namespace, it
> will have health reporters, dump regions, eswitch VF and PF ports,
> representor binding, "enable_roce" extensions.
> Because that vision is not clear in the cover letter [1] at least to me.
>
> Resource division was very clearly described in (4) in cover letter [1] which
> you were fan of from 2016 in [2].
> MSIX or in general interrupts resource splitting among VFs is no different
> than any other resource division.
>
> devlink eswitch port for anchoring interrupt attribute is certainly abuse of an
> 'esw port' object.
> Defining a subdev object, even though today it holds just 2 attributes
> (hw_address, and msix interrupt), it still sane object hierarchy.
> Setting host side mac address is not really job of eswitch ports.
> It may be on the same ASIC, but it should be put at right object.
>
> > What seems to happen too often is that we skip creating better infra
> > for existing devices and create it only for the newest shiniest object.
> This infrastructure is for existing devices (non smartnic devices too).
> May be worth clarifying explicitly in cover letter.
>
> > I'm not sure if that is part of what bothers Jakub, but it is one
> > thing that bothers me and why this feels incomplete.
> Let's work towards making it complete as much as for given functionality.
>
> >
> > The thing that has been bouncing around in my head about this (and
> > until I was in front of a good text-based MUA I didn't dare respond)
> > is that we seem to have an overlap in functionality between what you
> > are proposing and existing virtual device configuration, but you are
> > completely ignoring improving upon the existing creation methods.
> > Whether dealing with a SmartNIC (which I used to describe a NIC with
> > general purpose processors that could be running Linux and I think you
> > do, too) or a regular NIC it seems like we should use devlink to
> > create and configure a variety of devices these could be:
> >
> > 1. Number of PFs (there are cases where more than 1 PF per uplink port
> > is required for command/control of the SmartNIC or where a single
> > PCI b/d/f may have many uplink ports that need to be addressed
> > separately)
> How is this related to subdev object? Do you mean user should be able to
> create PF subdevice object for host?
> Currently decision to create subdev objects is done by the vendor driver.
> But it is not prevented in future to extend it, if it make sense.
> You need to describe the usecase and usual review process..
>
> > 2. Device-specific SR-IOV VFs visible to server 3. mdev devices that
> > _may_ have representers on the embedded cores of
> > the NIC
> There is no mdev NIC devices on embedded cores NIC driver exist in kernel
> today.
> I posted the series [3] to cover this part and we worked through kvm and
> netdev mailing list to get right, deterministic phys_port_name for it, right
> devlink port flavour etc necessary plumbing. This was done 2 months before
> posting [3].
> So mdev's subdev management is covered and cover letter also talks about it
> to manage it in uniform way as PF/VF/mdev.
>
> > 4. Hardware VirtIO-net devices supported
> How this is blocked by subdev object introduction?
> The way I envision from the recent discussion is, There will be mdev-virtio or
> virtio bus where hardware virto devices will be exposed.
> And therefore, their subdevices can be also controlled this way.
> But my understanding of unpublished mellanox virtio-net device and yours
> could be different. :-) So if you have link/ppt/rfc/patches I should read,
> please let me know, how you see hw virtio-net devices.
>
> > 5. Other non-network devices
> > (storage given as the first example) 6. ...
> >
> I think its really beyond the scope of life cycling everything using this subdev
> object.
> It is not a ultimate solution to all use cases. :-)
>
> > We cannot get rid of the methods for creating hardware VFs via sysfs,
> > but now that we are seeing lots of different flavors of devices that
> > might be created we should start by making sure at a minimum we can
> > create existing hardware devices (VFs) with this devlink interface and move
> from there.
> Well, I have been thinking about it for last 6-7 months as there is locking
> issues present between num_vfs sysfs (device lock!), ip link (no lock) and
> devlink (devlink lock).
> All unsyncronized with each other which leads to call traces; using some lock
> in multiple drivers will be a red bandage.
> It is not the scope of this series.
>
> > Is there a plan
> > to use this interface to create subdevs that are VFs or just new subdev
> flavors?
> Creating individual VFs is supported by PCI spec when I checked last time.
> Do you know if that is supported now?
> Last year autoprobe file was added to pci core to dynamically bind the VFs
> and bug was fixed recently by Alex Williamson.
>
> Let's say if that is added in future, even than subdev object creation on event
> of dynamic VF creation holds good. Isn't it?
> Whenever mdev series updated version gets accepted, it will dynamically
> create the subdev object anyway of mdev or some other flavour.
>
> > I could start to get behind an interface like this if the patches
> > showed that devlink would be the one place where device creation and
> > control could be done for all types of subdevs, but that isn't clear
> > that it is your direction based on the patches.
> >
> > So just to make sure this is clear, what I'm proposing that devlink is
> > used to create and configure all types of devices that it may be
> > possible to create, configure, or address from a PF and the starting
> > place should be standard, hardware VFs.
> I do not agree with 'creating all devices' part that it should be the starting
> point given that other method already exists for VFs, mdev (though its buggy
> for VFs).
> Extending devlink to create mdev is nice, but not a must. Same goes for other
> flavours such as virtio/sf.
> I also proposed that to create them via devlink in [4], but its not must.
> This series for mdev [3] already fits in the existing devlink, eswitch, devlink
> port, devlink subdev object model.
>
> > If that can be done right and we can address some of the issues with
> > the current implementation (more than one hw addr is a _great_
> > example) then adding new flavours for the server devices and and
> > adding new flavors for SmartNIC-based devices should be easy and feel
> natural.
> I also agree that setting more than one hw address should be supported,
> Jakub highlighted too of this need for Intel.
> Command should be similar to 'ip addr add', and not 'devlink set hw_addr'.
> Mlx5 doesn't currently supported so setting multiple hw_address should
> return right error code ENOSPEC or ENOSUPP by mlx5 driver.
>
> Yuval, Can you please address this comment from Jakub and Andy to support
> multiple hw_address?
>
> I just want to say that subdev is not so grand object.
> As use case, features, ASIC evolves right attributes should be placed in right
> object.
> (just like how devlink evolved from [1] to now.) Aren't we reviewing the
> patches? Why should it become a unconstrained object to put everything in
> it?
> This series doesn't ask to put everything in it.
>
> As Yuval pointed, there are two/three near term plan after this series.
> 1. Grouping VFs subdevs so that host side min/max rates can be applied on
> the group (group of VFs) 2. Extending subdev flavour for mdev/virtio/sf
> depending on how series [3] take shape 3. msix configuration for VFs
>
> [1] https://lwn.net/Articles/677967/
> [2] https://www.spinics.net/lists/netdev/msg365532.html
> [3] https://lore.kernel.org/linux-rdma/20191107160448.20962-1-
> parav@...lanox.com/
> [4] https://lore.kernel.org/linux-
> rdma/AM0PR05MB4866444210721BC4EE775D27D17B0@...PR05MB4866.e
> urprd05.prod.outlook.com/
>
Powered by blists - more mailing lists