lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 11 Nov 2019 18:46:52 +0000
From:   Yuval Avnery <yuvalav@...lanox.com>
To:     Jakub Kicinski <jakub.kicinski@...ronome.com>,
        Jiri Pirko <jiri@...lanox.com>
CC:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        Saeed Mahameed <saeedm@...lanox.com>,
        "leon@...nel.org" <leon@...nel.org>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "shuah@...nel.org" <shuah@...nel.org>,
        Daniel Jurgens <danielj@...lanox.com>,
        Parav Pandit <parav@...lanox.com>,
        "andrew.gospodarek@...adcom.com" <andrew.gospodarek@...adcom.com>,
        "michael.chan@...adcom.com" <michael.chan@...adcom.com>
Subject: RE: [PATCH net-next v2 00/10] devlink subdev



> -----Original Message-----
> From: Jakub Kicinski <jakub.kicinski@...ronome.com>
> Sent: Monday, November 11, 2019 10:00 AM
> To: Yuval Avnery <yuvalav@...lanox.com>; Jiri Pirko <jiri@...lanox.com>
> Cc: netdev@...r.kernel.org; Saeed Mahameed <saeedm@...lanox.com>;
> leon@...nel.org; davem@...emloft.net; shuah@...nel.org; Daniel Jurgens
> <danielj@...lanox.com>; Parav Pandit <parav@...lanox.com>;
> andrew.gospodarek@...adcom.com; michael.chan@...adcom.com
> Subject: Re: [PATCH net-next v2 00/10] devlink subdev
> 
> On Fri,  8 Nov 2019 18:18:36 +0200, Yuval Avnery wrote:
> > This patchset introduces devlink subdev.
> >
> > Currently, legacy tools do not provide a comprehensive solution that
> > can be used in both SmartNic and non-SmartNic mode.
> > Subdev represents a device that exists on the ASIC but is not
> > necessarily visible to the kernel.
> >
> > Using devlink ports is not suitable because:
> >
> > 1. Those devices aren't necessarily network devices (such as NVMe devices)
> >    and doesn’t have E-switch representation. Therefore, there is need for
> >    more generic representation of PCI VF.
> > 2. Some attributes are not necessarily pure port attributes
> >    (number of MSIX vectors)
> 
> PCIe attrs will require persistence. Where is that in this design?
> 
> Also MSIX vectors are configuration of the devlink port (ASIC side), the only
> reason you're putting them in subdev is because some of the subdevs don't
> have a port, muddying up the meaning of things.

The way I see this, they are both on the ASIC side.
But port has nothing to do with MSIX so subdev makes more sense.

> 
> > 3. It creates a confusing devlink topology, with multiple port flavours
> >    and indices.
> >
> > Subdev will be created along with flavour and attributes.
> > Some network subdevs may be linked with a devlink port.
> >
> > This is also aimed to replace "ip link vf" commands as they are
> > strongly linked to the PCI topology and allow access only to enabled VFs.
> > Even though current patchset and example is limited to MAC address of
> > the VF, this interface will allow to manage PF, VF, mdev in SmartNic
> > and non SmartNic modes, in unified way for networking and
> > non-networking devices via devlink instance.
> >
> > Use case example:
> > An example system view of a networking ASIC (aka SmartNIC), can be
> > seen in below diagram, where devlink eswitch instance and PCI PF
> > and/or VFs are situated on two different CPU subsystems:
> >
> >
> >       +------------------------------+
> >       |                              |
> >       |             HOST             |
> >       |                              |
> >       |   +----+-----+-----+-----+   |
> >       |   | PF | VF0 | VF1 | VF2 |   |
> >       +---+----+-----------+-----+---+
> >                  PCI1|
> >           +---+------------+
> >               |
> >      +----------------------------------------+
> >      |        |         SmartNic              |
> >      |   +----+-------------------------+     |
> >      |   |                              |     |
> >      |   |               NIC            |     |
> >      |   |                              |     |
> >      |   +---------------------+--------+     |
> >      |                         |  PCI2        |
> >      |         +-----+---------+--+           |
> >      |               |                        |
> >      |      +-----+--+--+--------------+      |
> >      |      |     | PF  |              |      |
> >      |      |     +-----+              |      |
> >      |      |      Embedded CPU        |      |
> >      |      |                          |      |
> >      |      +--------------------------+      |
> >      |                                        |
> >      +----------------------------------------+
> >
> > The below diagram shows an example devlink subdev topology where
> some
> > subdevs are connected to devlink ports::
> >
> >
> >
> >             (PF0)    (VF0)    (VF1)           (NVME VF2)
> >          +--------------------------+         +--------+
> >          | devlink| devlink| devlink|         | devlink|
> >          | subdev | subdev | subdev |         | subdev |
> >          |    0   |    1   |    2   |         |    3   |
> >          +--------------------------+         +--------+
> >               |        |        |
> 
> What is this NVME VF2 connected to? It's gotta get traffic from somewhere?
> Frames come in from the uplink, then what?

The whole point here is that this is not a network device
So it has nothing to do with the network.
It is simply a PCI device exposed by the NIC to the host.
devlink subdev lets us configure attributes for this device.

> 
> >               |        |        |
> >               |        |        |
> >      +----------------------------------+
> >      |   | devlink| devlink| devlink|   |
> >      |   |  port  |  port  |  port  |   |
> >      |   |    0   |    1   |    2   |   |
> >      |   +--------------------------+   |
> >      |                                  |
> >      |                                  |
> >      |           E-switch               |
> >      |                                  |
> >      |                                  |
> >      |          +--------+              |
> >      |          | uplink |              |
> >      |          | devlink|              |
> >      |          |  port  |              |
> >      +----------------------------------+
> >
> > Devlink command example:
> >
> > A privileged user wants to configure a VF's hw_addr, before the VF is
> > enabled.
> >
> > $ devlink subdev set pci/0000:03:00.0/1 hw_addr 10:22:33:44:55:66
> >
> > $ devlink subdev show pci/0000:03:00.0/1
> > pci/0000:03:00.0/1: flavour pcivf pf 0 vf 0 port_index 1 hw_addr
> > 10:22:33:44:55:66
> >
> > $ devlink subdev show pci/0000:03:00.0/1 -jp {
> >     "subdev": {
> >         "pci/0000:03:00.0/1": {
> >             "flavour": "pcivf",
> 
> If the flavour is pcivf what differentiates this (Ethernet) VF from a NVME
> one?
> 

This describes the bus not the device purpose.

> >             "pf": 0,
> >             "vf": 0,
> >             "port_index": 1,
> >             "hw_addr": "10:22:33:44:55:66"
> 
> Since you're messing with the "hw_addr", you should at least provision the
> uAPI for adding multiple addresses. Intel guys were asking for this long time
> ago.

This is the "permanent" address. The one the subdev boots with.

> 
> Have you considered implementing some compat code so drivers don't have
> to implement the legacy ndos if they support subdevs?

Will consider

> 
> >         }
> >     }
> > }
> 
> Okay, so you added two diagrams. I guess I was naive in thinking that "you
> thought this all through in detail and have more documentation and design
> docs internally".
> 
> I don't like how unconstrained this is, the only implemented use case is
> weak. But since you're not seeing this you probably never will, so seems like
> a waste of time to fight it.

What am I missing? How would you constrain this?
I don’t see how this is weak. There is currently no way to set/show those devices from SmartNic.
We introduced a way that will cover both SmartNic and non-SmartNic mode.
We are also planning to add rate groups which will control the min/max rate of each subdev,
and support grouping them into rate groups.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ