[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 11 Nov 2019 18:46:52 +0000
From: Yuval Avnery <yuvalav@...lanox.com>
To: Jakub Kicinski <jakub.kicinski@...ronome.com>,
Jiri Pirko <jiri@...lanox.com>
CC: "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
Saeed Mahameed <saeedm@...lanox.com>,
"leon@...nel.org" <leon@...nel.org>,
"davem@...emloft.net" <davem@...emloft.net>,
"shuah@...nel.org" <shuah@...nel.org>,
Daniel Jurgens <danielj@...lanox.com>,
Parav Pandit <parav@...lanox.com>,
"andrew.gospodarek@...adcom.com" <andrew.gospodarek@...adcom.com>,
"michael.chan@...adcom.com" <michael.chan@...adcom.com>
Subject: RE: [PATCH net-next v2 00/10] devlink subdev
> -----Original Message-----
> From: Jakub Kicinski <jakub.kicinski@...ronome.com>
> Sent: Monday, November 11, 2019 10:00 AM
> To: Yuval Avnery <yuvalav@...lanox.com>; Jiri Pirko <jiri@...lanox.com>
> Cc: netdev@...r.kernel.org; Saeed Mahameed <saeedm@...lanox.com>;
> leon@...nel.org; davem@...emloft.net; shuah@...nel.org; Daniel Jurgens
> <danielj@...lanox.com>; Parav Pandit <parav@...lanox.com>;
> andrew.gospodarek@...adcom.com; michael.chan@...adcom.com
> Subject: Re: [PATCH net-next v2 00/10] devlink subdev
>
> On Fri, 8 Nov 2019 18:18:36 +0200, Yuval Avnery wrote:
> > This patchset introduces devlink subdev.
> >
> > Currently, legacy tools do not provide a comprehensive solution that
> > can be used in both SmartNic and non-SmartNic mode.
> > Subdev represents a device that exists on the ASIC but is not
> > necessarily visible to the kernel.
> >
> > Using devlink ports is not suitable because:
> >
> > 1. Those devices aren't necessarily network devices (such as NVMe devices)
> > and doesn’t have E-switch representation. Therefore, there is need for
> > more generic representation of PCI VF.
> > 2. Some attributes are not necessarily pure port attributes
> > (number of MSIX vectors)
>
> PCIe attrs will require persistence. Where is that in this design?
>
> Also MSIX vectors are configuration of the devlink port (ASIC side), the only
> reason you're putting them in subdev is because some of the subdevs don't
> have a port, muddying up the meaning of things.
The way I see this, they are both on the ASIC side.
But port has nothing to do with MSIX so subdev makes more sense.
>
> > 3. It creates a confusing devlink topology, with multiple port flavours
> > and indices.
> >
> > Subdev will be created along with flavour and attributes.
> > Some network subdevs may be linked with a devlink port.
> >
> > This is also aimed to replace "ip link vf" commands as they are
> > strongly linked to the PCI topology and allow access only to enabled VFs.
> > Even though current patchset and example is limited to MAC address of
> > the VF, this interface will allow to manage PF, VF, mdev in SmartNic
> > and non SmartNic modes, in unified way for networking and
> > non-networking devices via devlink instance.
> >
> > Use case example:
> > An example system view of a networking ASIC (aka SmartNIC), can be
> > seen in below diagram, where devlink eswitch instance and PCI PF
> > and/or VFs are situated on two different CPU subsystems:
> >
> >
> > +------------------------------+
> > | |
> > | HOST |
> > | |
> > | +----+-----+-----+-----+ |
> > | | PF | VF0 | VF1 | VF2 | |
> > +---+----+-----------+-----+---+
> > PCI1|
> > +---+------------+
> > |
> > +----------------------------------------+
> > | | SmartNic |
> > | +----+-------------------------+ |
> > | | | |
> > | | NIC | |
> > | | | |
> > | +---------------------+--------+ |
> > | | PCI2 |
> > | +-----+---------+--+ |
> > | | |
> > | +-----+--+--+--------------+ |
> > | | | PF | | |
> > | | +-----+ | |
> > | | Embedded CPU | |
> > | | | |
> > | +--------------------------+ |
> > | |
> > +----------------------------------------+
> >
> > The below diagram shows an example devlink subdev topology where
> some
> > subdevs are connected to devlink ports::
> >
> >
> >
> > (PF0) (VF0) (VF1) (NVME VF2)
> > +--------------------------+ +--------+
> > | devlink| devlink| devlink| | devlink|
> > | subdev | subdev | subdev | | subdev |
> > | 0 | 1 | 2 | | 3 |
> > +--------------------------+ +--------+
> > | | |
>
> What is this NVME VF2 connected to? It's gotta get traffic from somewhere?
> Frames come in from the uplink, then what?
The whole point here is that this is not a network device
So it has nothing to do with the network.
It is simply a PCI device exposed by the NIC to the host.
devlink subdev lets us configure attributes for this device.
>
> > | | |
> > | | |
> > +----------------------------------+
> > | | devlink| devlink| devlink| |
> > | | port | port | port | |
> > | | 0 | 1 | 2 | |
> > | +--------------------------+ |
> > | |
> > | |
> > | E-switch |
> > | |
> > | |
> > | +--------+ |
> > | | uplink | |
> > | | devlink| |
> > | | port | |
> > +----------------------------------+
> >
> > Devlink command example:
> >
> > A privileged user wants to configure a VF's hw_addr, before the VF is
> > enabled.
> >
> > $ devlink subdev set pci/0000:03:00.0/1 hw_addr 10:22:33:44:55:66
> >
> > $ devlink subdev show pci/0000:03:00.0/1
> > pci/0000:03:00.0/1: flavour pcivf pf 0 vf 0 port_index 1 hw_addr
> > 10:22:33:44:55:66
> >
> > $ devlink subdev show pci/0000:03:00.0/1 -jp {
> > "subdev": {
> > "pci/0000:03:00.0/1": {
> > "flavour": "pcivf",
>
> If the flavour is pcivf what differentiates this (Ethernet) VF from a NVME
> one?
>
This describes the bus not the device purpose.
> > "pf": 0,
> > "vf": 0,
> > "port_index": 1,
> > "hw_addr": "10:22:33:44:55:66"
>
> Since you're messing with the "hw_addr", you should at least provision the
> uAPI for adding multiple addresses. Intel guys were asking for this long time
> ago.
This is the "permanent" address. The one the subdev boots with.
>
> Have you considered implementing some compat code so drivers don't have
> to implement the legacy ndos if they support subdevs?
Will consider
>
> > }
> > }
> > }
>
> Okay, so you added two diagrams. I guess I was naive in thinking that "you
> thought this all through in detail and have more documentation and design
> docs internally".
>
> I don't like how unconstrained this is, the only implemented use case is
> weak. But since you're not seeing this you probably never will, so seems like
> a waste of time to fight it.
What am I missing? How would you constrain this?
I don’t see how this is weak. There is currently no way to set/show those devices from SmartNic.
We introduced a way that will cover both SmartNic and non-SmartNic mode.
We are also planning to add rate groups which will control the min/max rate of each subdev,
and support grouping them into rate groups.
Powered by blists - more mailing lists