netdev - RE: [PATCH net-next v2 00/10] devlink subdev

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <AM0PR05MB4866553530461C933DA3181ED1760@AM0PR05MB4866.eurprd05.prod.outlook.com>
Date:   Wed, 13 Nov 2019 06:37:27 +0000
From:   Parav Pandit <parav@...lanox.com>
To:     Andy Gospodarek <andrew.gospodarek@...adcom.com>,
        Yuval Avnery <yuvalav@...lanox.com>
CC:     Jakub Kicinski <jakub.kicinski@...ronome.com>,
        Jiri Pirko <jiri@...lanox.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        Saeed Mahameed <saeedm@...lanox.com>,
        "leon@...nel.org" <leon@...nel.org>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "shuah@...nel.org" <shuah@...nel.org>,
        Daniel Jurgens <danielj@...lanox.com>,
        "michael.chan@...adcom.com" <michael.chan@...adcom.com>
Subject: RE: [PATCH net-next v2 00/10] devlink subdev

Hi Andy,

> From: Andy Gospodarek <andrew.gospodarek@...adcom.com>
> Sent: Tuesday, November 12, 2019 8:56 AM
> On Mon, Nov 11, 2019 at 06:46:52PM +0000, Yuval Avnery wrote:
> >
> > >
[..]

> 
> It feels a bit 'unconstrained' to me as well.  As Jakub said you added some
> documentation, but the direction of this long-term is not clear.

I am interested to know if the long-term direction of the devlink was clear enough in cover letter [1], that devlink device will change net namespace, it will have health reporters, dump regions, eswitch VF and PF ports, representor binding, "enable_roce" extensions.
Because that vision is not clear in the cover letter [1] at least to me.

Resource division was very clearly described in (4) in cover letter [1] which you were fan of from 2016 in [2].
MSIX or in general interrupts resource splitting among VFs is no different than any other resource division.

devlink eswitch port for anchoring interrupt attribute is certainly abuse of an 'esw port' object.
Defining a subdev object, even though today it holds just 2 attributes (hw_address, and msix interrupt), it still sane object hierarchy.
Setting host side mac address is not really job of eswitch ports.
It may be on the same ASIC, but it should be put at right object.

> What seems to happen too often is that we skip creating better infra for
> existing devices and create it only for the newest shiniest object.
This infrastructure is for existing devices (non smartnic devices too).
May be worth clarifying explicitly in cover letter.

> I'm not sure if that is part of what bothers Jakub,
> but it is one thing that bothers me and why this feels incomplete.
Let's work towards making it complete as much as for given functionality.

> 
> The thing that has been bouncing around in my head about this (and until I was
> in front of a good text-based MUA I didn't dare respond) is that we seem to
> have an overlap in functionality between what you are proposing and existing
> virtual device configuration, but you are completely ignoring improving upon
> the existing creation methods.
> Whether dealing with a SmartNIC (which I used to describe a NIC with general
> purpose processors that could be running Linux and I think you do, too) or a
> regular NIC it seems like we should use devlink to create and configure a
> variety of devices these could be:
> 
> 1.  Number of PFs (there are cases where more than 1 PF per uplink port
>     is required for command/control of the SmartNIC or where a single
>     PCI b/d/f may have many uplink ports that need to be addressed
>     separately)
How is this related to subdev object? Do you mean user should be able to create PF subdevice object for host?
Currently decision to create subdev objects is done by the vendor driver.
But it is not prevented in future to extend it, if it make sense.
You need to describe the usecase and usual review process..

> 2.  Device-specific SR-IOV VFs visible to server 
> 3.  mdev devices that _may_
> have representers on the embedded cores of
>     the NIC
There is no mdev NIC devices on embedded cores NIC driver exist in kernel today.
I posted the series [3] to cover this part and we worked through kvm and netdev mailing list to get right, deterministic phys_port_name for it, right devlink port flavour etc necessary plumbing. This was done 2 months before posting [3].
So mdev's subdev management is covered and cover letter also talks about it to manage it in uniform way as PF/VF/mdev.

> 4.  Hardware VirtIO-net devices supported 
How this is blocked by subdev object introduction?
The way I envision from the recent discussion is,
There will be mdev-virtio or virtio bus where hardware virto devices will be exposed.
And therefore, their subdevices can be also controlled this way.
But my understanding of unpublished mellanox virtio-net device and yours could be different. :-)
So if you have link/ppt/rfc/patches I should read, please let me know, how you see hw virtio-net devices.

> 5.  Other non-network devices
> (storage given as the first example) 6.  ...
> 
I think its really beyond the scope of life cycling everything using this subdev object.
It is not a ultimate solution to all use cases. :-)

> We cannot get rid of the methods for creating hardware VFs via sysfs, but now
> that we are seeing lots of different flavors of devices that might be created we
> should start by making sure at a minimum we can create existing hardware
> devices (VFs) with this devlink interface and move from there.
Well, I have been thinking about it for last 6-7 months as there is locking issues present between num_vfs sysfs (device lock!), ip link (no lock) and devlink (devlink lock).
All unsyncronized with each other which leads to call traces; using some lock in multiple drivers will be a red bandage.
It is not the scope of this series.

> Is there a plan
> to use this interface to create subdevs that are VFs or just new subdev flavors?
Creating individual VFs is supported by PCI spec when I checked last time.
Do you know if that is supported now?
Last year autoprobe file was added to pci core to dynamically bind the VFs and bug was fixed recently by Alex Williamson.

Let's say if that is added in future, even than subdev object creation on event of dynamic VF creation holds good. Isn't it?
Whenever mdev series updated version gets accepted, it will dynamically create the subdev object anyway of mdev or some other flavour.

> I could start to get behind an interface like this if the patches showed that
> devlink would be the one place where device creation and control could be
> done for all types of subdevs, but that isn't clear that it is your direction based
> on the patches.
> 
> So just to make sure this is clear, what I'm proposing that devlink is used to
> create and configure all types of devices that it may be possible to create,
> configure, or address from a PF and the starting place should be standard,
> hardware VFs.  
I do not agree with 'creating all devices' part that it should be the starting point given that other method already exists for VFs, mdev (though its buggy for VFs).
Extending devlink to create mdev is nice, but not a must. Same goes for other flavours such as virtio/sf.
I also proposed that to create them via devlink in [4], but its not must.
This series for mdev [3] already fits in the existing devlink, eswitch, devlink port, devlink subdev object model.

> If that can be done right and we can address some of the issues
> with the current implementation (more than one hw addr is a _great_
> example) then adding new flavours for the server devices and and adding new
> flavors for SmartNIC-based devices should be easy and feel natural.
I also agree that setting more than one hw address should be supported, Jakub highlighted too of this need for Intel.
Command should be similar to 'ip addr add', and not 'devlink set hw_addr'.
Mlx5 doesn't currently supported so setting multiple hw_address should return right error code ENOSPEC or ENOSUPP by mlx5 driver.

Yuval, Can you please address this comment from Jakub and Andy to support multiple hw_address?

I just want to say that subdev is not so grand object.
As use case, features, ASIC evolves right attributes should be placed in right object.
(just like how devlink evolved from [1] to now.)
Aren't we reviewing the patches? Why should it become a unconstrained object to put everything in it?
This series doesn't ask to put everything in it.

As Yuval pointed, there are two/three near term plan after this series.
1. Grouping VFs subdevs so that host side min/max rates can be applied on the group (group of VFs)
2. Extending subdev flavour for mdev/virtio/sf depending on how series [3] take shape
3. msix configuration for VFs

[1] https://lwn.net/Articles/677967/
[2] https://www.spinics.net/lists/netdev/msg365532.html
[3] https://lore.kernel.org/linux-rdma/20191107160448.20962-1-parav@mellanox.com/
[4] https://lore.kernel.org/linux-rdma/AM0PR05MB4866444210721BC4EE775D27D17B0@AM0PR05MB4866.eurprd05.prod.outlook.com/