[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <AM6PR05MB5142180506EA11097FE7A4F0C5770@AM6PR05MB5142.eurprd05.prod.outlook.com>
Date: Tue, 12 Nov 2019 23:32:33 +0000
From: Yuval Avnery <yuvalav@...lanox.com>
To: Jakub Kicinski <jakub.kicinski@...ronome.com>
CC: Andy Gospodarek <andrew.gospodarek@...adcom.com>,
Jiri Pirko <jiri@...lanox.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
Saeed Mahameed <saeedm@...lanox.com>,
"leon@...nel.org" <leon@...nel.org>,
"davem@...emloft.net" <davem@...emloft.net>,
"shuah@...nel.org" <shuah@...nel.org>,
Daniel Jurgens <danielj@...lanox.com>,
Parav Pandit <parav@...lanox.com>,
"michael.chan@...adcom.com" <michael.chan@...adcom.com>
Subject: RE: [PATCH net-next v2 00/10] devlink subdev
> -----Original Message-----
> From: Jakub Kicinski <jakub.kicinski@...ronome.com>
> Sent: Tuesday, November 12, 2019 2:20 PM
> To: Yuval Avnery <yuvalav@...lanox.com>
> Cc: Andy Gospodarek <andrew.gospodarek@...adcom.com>; Jiri Pirko
> <jiri@...lanox.com>; netdev@...r.kernel.org; Saeed Mahameed
> <saeedm@...lanox.com>; leon@...nel.org; davem@...emloft.net;
> shuah@...nel.org; Daniel Jurgens <danielj@...lanox.com>; Parav Pandit
> <parav@...lanox.com>; michael.chan@...adcom.com
> Subject: Re: [PATCH net-next v2 00/10] devlink subdev
>
> On Tue, 12 Nov 2019 18:35:26 +0000, Yuval Avnery wrote:
> > > It feels a bit 'unconstrained' to me as well. As Jakub said you
> > > added some documentation, but the direction of this long-term is not
> clear.
> > > What seems to happen too often is that we skip creating better infra
> > > for existing devices and create it only for the newest shiniest object.
> > > These changes are what garners the most attention from those who
> > > grant permission to push things upstream (*cough* managers *cough*),
> > > but it's not the right choice in this case. I'm not sure if that is
> > > part of what bothers Jakub, but it is one thing that bothers me and why
> this feels incomplete.
> > >
> > > The thing that has been bouncing around in my head about this (and
> > > until I was in front of a good text-based MUA I didn't dare respond)
> > > is that we seem to have an overlap in functionality between what you
> > > are proposing and existing virtual device configuration, but you are
> > > completely ignoring improving upon the existing creation methods.
> > > Whether dealing with a SmartNIC (which I used to describe a NIC with
> > > general purpose processors that could be running Linux and I think
> > > you do,
> > > too) or a regular NIC it seems like we should use devlink to create
> > > and configure a variety of devices these could be:
> > >
> > > 1. Number of PFs (there are cases where more than 1 PF per uplink port
> > > is required for command/control of the SmartNIC or where a single
> > > PCI b/d/f may have many uplink ports that need to be addressed
> > > separately)
> >
> > Yes, uplink ports can be addressed using "devlink port".
> >
> > Multiple pfs are already supported.
> >
> > $ devlink subdev show
> > pci/0000:03:00.0/1: flavour pcipf pf 0
> > pci/0000:03:00.0/1: flavour pcipf pf 1
> > pci/0000:03:00.0/1: flavour pcivf pf 1 vf 0
>
> I don't think that covers what Andy was taking about.
>
> > > 2. Device-specific SR-IOV VFs visible to server 3. mdev devices
> > > that _may_ have representers on the embedded cores of
> > > the NIC
> >
> > Yes, Is also planned for current interface. (will need to add mdev
> > flavor in the future)
> >
> > > 4. Hardware VirtIO-net devices supported 5. Other non-network
> > > devices (storage given as the first example) 6. ...
> >
> > All is up to driver - what to expose and what flavor to grant the subdev.
> >
> > > We cannot get rid of the methods for creating hardware VFs via
> > > sysfs, but now that we are seeing lots of different flavors of
> > > devices that might be created we should start by making sure at a
> > > minimum we can create existing hardware devices (VFs) with this
> > > devlink interface and move from there. Is there a plan to use this
> > > interface to create subdevs that are VFs or just new subdev flavors?
> > > I could start to get behind an interface like this if the patches
> > > showed that devlink would be the one place where device creation and
> > > control could be done for all types of subdevs, but that isn't clear that it is
> your direction based on the patches.
> >
> > I don’t see how the SmartNic can force the host PF to create new VFs.
> > Isn’t that the host PF responsibility? Doesn't make sense to me.
> > Do we want to have an interface that works only for NICs and not from
> > the SmartNic embedded system?
> >
> > What we do here is expose all the PFS and potential VFs/mdevs that the
> host can enable.
> > Unlike ip link which exposes only enabled VFs, here (in mlx5
> > implementation) we expose all the VFs that the host PF _can_ enable.
> > This allows, for example, to set the MAC of a VF before it is enabled.
>
> So subdev will never be used to create interfaces? Just expose all _possible_?
> Ugh. Wouldn't that be something to document?
>
I think it should be the driver's implementation decision what subdevs to expose and when,
but perhaps we need more consistency here.
> How things are configured matters. This patch set throws some basics
> together to get you the hw_addr from the SmartNIC side, and then you start
> talking big plans like NVMe and PCIe control which you don't seem to have
> thought through.
>
> How does the PF in your diagram magically have different types of VFs?
> PCIe SR-IOV cap has VF device vendor:product, you won't have NVMe VFs
> hanging off the main PF like that in a normal world :/
I don’t care what the VF does, it is a VF that the ASIC exposes.
NVME was just an example. The driver can choose which VFs to expose and with what
Relevant attributes.
> Jiri pitched the subdev object as the "other side of the wire" but you are back
> to arguing that you think it's actually the same as the port, just more general.
> How does the hw_addr belongs to the ASIC side then?
No, the port is a port on the e-switch, not the NIC port.
The subdev holds the NIC's attributes (among other attributes)
Yes, the peer sub-object would cover this case but we wanted to give a more
general solution that covers all the PCIe functions that the ASIC exposes to the host
(also those that are not NICs).
> And for PCIe you'd use this new interface for MSI-X or other resource
> allocation. Say VFs. Who controls that in a SmartNIC scenario? Is the host not
> allowed to move them around? This starts to touch multi-host use case
> which Linux networking never begun to address.
I don’t think we should discuss MSI-X specifics, it is just an example of a resource.
Probably max_tx_rate is a much better example.
> You're creating a new object type with no clear semantics just to throw in
> one feature - namely the hw_addr. If this is the quality of design we can
> expect - just add the hw addr to devlink ports and let's move on.
>
But it is not the hw_addr of the switch port.
It is logically not true.
Actually the fact that the ports have flavors which tell to which device
they are connected to, is a bit awkward.
> > > So just to make sure this is clear, what I'm proposing that devlink
> > > is used to create and configure all types of devices that it may be
> > > possible to create,
> >
> > More flavors can be defiantly added in the future. About creation, see
> > my comment above. Actually flavor is the only required attribute for
> > registration. The rest is optional.
> > > configure, or address from a PF and the starting place should be
> > > standard, hardware VFs. If that can be done right and we can
> > > address some of the issues with the current implementation (more
> > > than one hw addr is a _great_ example) then adding new flavours for
> > > the server devices and and adding new flavors for SmartNIC-based
> > > devices should be easy and feel natural.
> >
> > So what you are saying that allowing multiple addresses per subdev
> > will make a strong case for this patch set?
Powered by blists - more mailing lists