[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190305110718.GD2314@nanopsycho>
Date:   Tue, 5 Mar 2019 12:07:18 +0100
From:   Jiri Pirko <jiri@...nulli.us>
To:     Jakub Kicinski <jakub.kicinski@...ronome.com>
Cc:     davem@...emloft.net, netdev@...r.kernel.org,
        oss-drivers@...ronome.com
Subject: Re: [PATCH net-next v2 4/7] devlink: allow subports on devlink PCI
 ports
Tue, Mar 05, 2019 at 01:40:07AM CET, jakub.kicinski@...ronome.com wrote:
>On Mon, 4 Mar 2019 12:19:02 +0100, Jiri Pirko wrote:
>> Sat, Mar 02, 2019 at 08:48:47PM CET, jakub.kicinski@...ronome.com wrote:
>> >On Sat, 2 Mar 2019 10:41:16 +0100, Jiri Pirko wrote:  
>> >> Fri, Mar 01, 2019 at 07:04:50PM CET, jakub.kicinski@...ronome.com wrote:  
>> >> >PCI endpoint corresponds to a PCI device, but such device
>> >> >can have one more more logical device ports associated with it.
>> >> >We need a way to distinguish those. Add a PCI subport in the
>> >> >dumps and print the info in phys_port_name appropriately.
>> >> >
>> >> >This is not equivalent to port splitting, there is no split
>> >> >group. It's just a way of representing multiple netdevs on
>> >> >a single PCI function.
>> >> >
>> >> >Note that the quality of being multiport pertains only to
>> >> >the PCI function itself. A PF having multiple netdevs does
>> >> >not mean that its VFs will also have multiple, or that VFs
>> >> >are associated with any particular port of a multiport VF.
>> >> >
>> >> >Example (bus 05 device has subports, bus 82 has only one port per
>> >> >function):
>> >> >
>> >> >$ devlink port
>> >> >pci/0000:05:00.0/0: type eth netdev enp5s0np0 flavour physical
>> >> >pci/0000:05:00.0/10000: type eth netdev enp5s0npf0s0 flavour pci_pf pf 0 subport 0
>> >> >pci/0000:05:00.0/4: type eth netdev enp5s0np1 flavour physical
>> >> >pci/0000:05:00.0/11000: type eth netdev enp5s0npf0s1 flavour pci_pf pf 0 subport 1    
>> >> 
>> >> So these subport devlink ports are eswitch ports for subports, right?
>> >> 
>> >> Please see the following drawing:
>> >> 
>> >>                                  +---+      +---+      +---+
>> >>                             pfsub| 5 |    vf| 6 |      | 7 |pfsub
>> >>                                  +-+-+      +-+-+      +-+-+
>> >> physical link <---------+          |          |          |
>> >>                         |          |          |          |
>> >>                         |          |          |          |
>> >>                         |          |          |          |
>> >>                       +-+-+      +-+-+      +-+-+      +-+-+
>> >>                       | 1 |      | 2 |      | 3 |      | 4 |
>> >>                    +--+---+------+---+------+---+------+---+--+
>> >>                    |  physical    pfsub      vf         pfsub |
>> >>                    |  port        port       port       port  |
>> >>                    |                                          |
>> >>                    |                  eswitch                 |
>> >>                    |                                          |
>> >>                    |                                          |
>> >>                    +------------------------------------------+
>> >> 
>> >> 1) pci/0000:05:00.0/0: type eth netdev enp5s0np0 flavour physical switch_id 00154d130d2f
>> >> 2) pci/0000:05:00.0/10000: type eth netdev enp5s0npf0s0 flavour pci_pf pf 0 subport 0 switch_id 00154d130d2f
>> >> 3) pci/0000:05:00.0/10001: type eth netdev enp5s0npf0vf0 flavour pci_vf pf 0 vf 0 switch_id 00154d130d2f
>> >> 4) pci/0000:05:00.0/10001: type eth netdev enp5s0npf0s1 flavour pci_pf pf 0 subport 1 switch_id 00154d130d2f
>> >> 
>> >> This is basically what you have and I think we are in sync with that.
>> >> But what about 5,6,7? Should they have devlink port instances too?
>> >> 
>> >> 5) pci/0000:05:00.0/1: type eth netdev enp5s0f0?? flavour ???? pf 0 subport 0
>> >> 6) pci/0000:05:10.1/0: type eth netdev enp5s10f0 flavour ???? pf 0 vf 0
>> >> 7) pci/0000:05:00.0/1: type eth netdev enp5s0f0?? flavour ???? pf 0 subport 1
>> >> 
>> >> These are the "peers".
>> >> I think that there could be flavours "pci_pf" and "pci_vf". Then the
>> >> "representors" (switch ports) could have flavours "pci_pf_port" and
>> >> "pci_vf_port" or something like that. User can see right away
>> >> that is not "PF" of "VF" but rather something "on the other end".
>> >> Note there is no "switch_id" for these devlink ports that tells the user
>> >> these devlink ports are not part of any switch.
>> >> What do you think?  
>> >
>> >Hmmm.. Hm. Hm.
>> >
>> >To me its neat if the devlink instance matches an ASIC.  I think it's
>> >kind of clear for people to understand what it stands for then.  So if
>> >we wanted to do the above we'd have to make the switch_id the first
>> >class identifier for devlink instances, rather than the bus?  But then
>> >VF instances don't have a switch ID so that doesn't work...
>> >
>> >I need to think about it.
>> >
>> >It's also kind of strange that we have to add the noun *port* to the
>> >flavour of... a port...  So I would prefer not to have those showing up
>> >as ports.  Can we invent a new command (say "partition"?) that'd take
>> >the bus info where the partition is to be spawned?  
>> 
>> Devlink does not supposed to be only there for switches. From the
>> beginning the design was to handle cases where the netdev/ib_dev is not
>> the correct handle. Not only in case you have multiple instances (ports)
>> for one ASIC, but also in case you have only one. Example use case is
>> port-type-change (eth->ib,ib->eth).
>> 
>> I chose word "port" as the parent devlink instance is "dev" and if you
>> partition the ASIC you basically got "ports", each of a different flavour.
>> 
>> And as you said, devlink instance matches one ASIC. Therefore the
>> devlink instance should contain all bits there are part of that ASIC,
>> not only switch/eswitch ports. That would be very limitting.
>
>I could read this as us being in full agreement, but I'm not sure..
>I think we agree that all objects of an ASIC should be under one
>devlink instance, the question remains whether both ends of the pipe
>for PCI devices (subdevs or not) should appear under ports or does the
>"far end" (from ASICs perspective)/"host end" get its own category.
Yep. Please see the suggestion about "flavour host" I did in other reply
in this thread.
Powered by blists - more mailing lists
 
