[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9f3974a1-95e9-a482-3dcd-0b23246d9ab7@mellanox.com>
Date: Thu, 24 Oct 2019 00:11:48 +0000
From: Yuval Avnery <yuvalav@...lanox.com>
To: Jakub Kicinski <jakub.kicinski@...ronome.com>,
Jiri Pirko <jiri@...nulli.us>
CC: "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
Jiri Pirko <jiri@...lanox.com>,
Saeed Mahameed <saeedm@...lanox.com>,
"leon@...nel.org" <leon@...nel.org>,
"davem@...emloft.net" <davem@...emloft.net>,
"shuah@...nel.org" <shuah@...nel.org>,
Daniel Jurgens <danielj@...lanox.com>
Subject: Re: [PATCH net-next 0/9] devlink vdev
On 2019-10-23 3:14 p.m., Jakub Kicinski wrote:
> On Wed, 23 Oct 2019 21:25:12 +0200, Jiri Pirko wrote:
>> Wed, Oct 23, 2019 at 09:00:46PM CEST, jakub.kicinski@...ronome.com wrote:
>>> On Tue, 22 Oct 2019 20:43:01 +0300, Yuval Avnery wrote:
>>>> This patchset introduces devlink vdev.
>>>>
>>>> Currently, legacy tools do not provide a comprehensive solution that can
>>>> be used in both SmartNic and non-SmartNic mode.
>>>> Vdev represents a device that exists on the ASIC but is not necessarily
>>>> visible to the kernel.
>>>>
>>>> Using devlink ports is not suitable because:
>>>>
>>>> 1. Those devices aren't necessarily network devices (such as NVMe devices)
>>>> and doesn’t have E-switch representation. Therefore, there is need for
>>>> more generic representation of PCI VF.
>>>> 2. Some attributes are not necessarily pure port attributes
>>>> (number of MSIX vectors)
>>>> 3. It creates a confusing devlink topology, with multiple port flavours
>>>> and indices.
>>>>
>>>> Vdev will be created along with flavour and attributes.
>>>> Some network vdevs may be linked with a devlink port.
>>>>
>>>> This is also aimed to replace "ip link vf" commands as they are strongly
>>>> linked to the PCI topology and allow access only to enabled VFs.
>>>> Even though current patchset and example is limited to MAC address
>>>> of the VF, this interface will allow to manage PF, VF, mdev in
>>>> SmartNic and non SmartNic modes, in unified way for networking and
>>>> non-networking devices via devlink instance.
>>>>
>>>> Example:
>>>>
>>>> A privileged user wants to configure a VF's hw_addr, before the VF is
>>>> enabled.
>>>>
>>>> $ devlink vdev set pci/0000:03:00.0/1 hw_addr 10:22:33:44:55:66
>>>>
>>>> $ devlink vdev show pci/0000:03:00.0/1
>>>> pci/0000:03:00.0/1: flavour pcivf pf 0 vf 0 port_index 1 hw_addr 10:22:33:44:55:66
>>>>
>>>> $ devlink vdev show pci/0000:03:00.0/1 -jp
>>>> {
>>>> "vdev": {
>>>> "pci/0000:03:00.0/1": {
>>>> "flavour": "pcivf",
>>>> "pf": 0,
>>>> "vf": 0,
>>>> "port_index": 1,
>>>> "hw_addr": "10:22:33:44:55:66"
>>>> }
>>>> }
>>>> }
>>> I don't trust this is a good design.
>>>
>>> We need some proper ontology and decisions what goes where. We have
>>> half of port attributes duplicated here, and hw_addr which honestly
>>> makes more sense in a port (since port is more of a networking
>>> construct, why would ep storage have a hw_addr?). Then you say you're
>>> going to dump more PCI stuff in here :(
>> Well basically what this "vdev" is is the "port peer" we discussed
>> couple of months ago. It provides possibility for the user on bare metal
>> to cofigure things for the VF - for example.
>>
>> Regarding hw_addr vs. port - it is not correct to make that a devlink
>> port attribute. It is not port's hw_addr, but the port's peer hw_addr.
> Yeah, I remember us arguing with others that "the other side of the
> wire" should not be a port.
>
>>> "vdev" sounds entirely meaningless, and has a high chance of becoming
>>> a dumping ground for attributes.
>> Sure, it is a madeup name. If you have a better name, please share.
> IDK. I think I started the "peer" stuff, so it made sense to me.
> Now it sounds like you'd like to kill a lot of problems with this
> one stone. For PCIe "vdev" is def wrong because some of the config
> will be for PF (which is not virtual). Also for PCIe the config has
> to be done with permanence in mind from day 1, PCI often requires
> HW reset to reconfig.
>
The PF is "virtual" from the SmartNic embedded CPU point of view.
Maybe gdev is better? (generic)
>> Basically it is something that represents VF/mdev - the other side of
>> devlink port. But in some cases, like NVMe, there is no associated
>> devlink port - that is why "devlink port peer" would not work here.
> What are the NVMe parameters we'd configure here? Queues etc. or some
> IDs? Presumably there will be a NVMe-specific way to configure things?
> Something has to point the NVMe VF to a backend, right?
>
> (I haven't looked much into NVMe myself in case that's not obvious ;))
Powered by blists - more mailing lists