[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BY5PR12MB43220099C235E238D6AF89EADC530@BY5PR12MB4322.namprd12.prod.outlook.com>
Date: Sat, 29 Aug 2020 03:43:58 +0000
From: Parav Pandit <parav@...dia.com>
To: Jakub Kicinski <kuba@...nel.org>
CC: Parav Pandit <parav@...lanox.com>,
"davem@...emloft.net" <davem@...emloft.net>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"roid@...lanox.com" <roid@...lanox.com>,
"saeedm@...lanox.com" <saeedm@...lanox.com>,
Jiri Pirko <jiri@...dia.com>
Subject: RE: [PATCH net-next 2/3] devlink: Consider other controller while
building phys_port_name
> From: Jakub Kicinski <kuba@...nel.org>
> Sent: Friday, August 28, 2020 10:14 PM
>
> On Fri, 28 Aug 2020 04:27:19 +0000 Parav Pandit wrote:
> > > From: Jakub Kicinski <kuba@...nel.org>
> > > Sent: Friday, August 28, 2020 3:12 AM
> > >
> > > On Thu, 27 Aug 2020 20:15:01 +0000 Parav Pandit wrote:
> > > > > From: Jakub Kicinski <kuba@...nel.org>
> > > > >
> > > > > I find it strange that you have pfnum 0 everywhere but then
> > > > > different controllers.
> > > > There are multiple PFs, connected to different PCI RC. So device
> > > > has same pfnum for both the PFs.
> > > >
> > > > > For MultiHost at Netronome we've used pfnum to distinguish
> > > > > between the hosts. ASIC must have some unique identifiers for each PF.
> > > > Yes. there is. It is identified by a unique controller number;
> > > > internally it is called host_number. But internal host_number is
> > > > misleading term as multiple cables of same physical card can be
> > > > plugged into single host. So identifying based on a unique
> > > > (controller) number and matching that up on external cable is desired.
> > > >
> > > > > I'm not aware of any practical reason for creating PFs on one RC
> > > > > without reinitializing all the others.
> > > > I may be misunderstanding, but how is initialization is related
> > > > multiple PFs?
> > >
> > > If the number of PFs is static it should be possible to understand
> > > which one is on which system.
> >
> > How? How do we tell that pfnum A means external system.
> > Want to avoid such 'implicit' notion.
>
> How do you tell that controller A means external system?
Which is why I started with annotating only external controllers, mainly to avoid renaming and breaking current scheme for non_smartnic cases which possibly is the most user base.
But probably external pcipf/vf/sf port flavours are more intuitive combined with controller number.
More below.
>
> > > > > I can see how having multiple controllers may make things
> > > > > clearer, but adding another layer of IDs while the one under it
> > > > > is unused
> > > > > (pfnum=0) feels very unnecessary.
> > > > pfnum=0 is used today. not sure I understand your comment about
> > > > being unused. Can you please explain?
> > >
> > > You examples only ever have pfnum 0:
> > >
> > Because both controllers have pfnum 0.
> >
> > > From patch 2:
> > >
> > > $ devlink port show pci/0000:00:08.0/2
> > > pci/0000:00:08.0/2: type eth netdev eth7 controller 0 flavour pcivf
> > > pfnum 0 vfnum 1 splittable false
> > > function:
> > > hw_addr 00:00:00:00:00:00
> > >
> > > $ devlink port show -jp pci/0000:00:08.0/2 {
> > > "port": {
> > > "pci/0000:00:08.0/1": {
> > > "type": "eth",
> > > "netdev": "eth7",
> > > "controller": 0,
> > > "flavour": "pcivf",
> > > "pfnum": 0,
> > > "vfnum": 1,
> > > "splittable": false,
> > > "function": {
> > > "hw_addr": "00:00:00:00:00:00"
> > > }
> > > }
> > > }
> > > }
> > >
> > > From earlier email:
> > >
> > > pci/0000:00:08.0/1: type eth netdev eth6 flavour pcipf pfnum 0
> > > pci/0000:00:08.0/2: type eth netdev eth7 flavour pcipf pfnum 0
> > >
> > > If you never use pfnum, you can just put the controller ID there, like
> Netronome.
> > >
> > It likely not going to work for us. Because pfnum is not some randomly
> generated number.
> > It is linked to the underlying PCI pf number. {pf0, pf1...}
> > Orchestration sw uses this to identify representor of a PF-VF pair.
>
> For orchestration software which is unaware of controllers ports will still alias
> on pf/vf nums.
>
Yes.
Orchestration which will be aware of controller, will use it.
> Besides you have one devlink instance per port currently so I'm guessing there is
> no pf1 ever, in your case...
>
Currently there are multiple devlink instance. One for pf0, other for pf1.
Ports of both instances have the same switch id.
> > Replacing pfnum with controller number breaks this; and it still doesn't tell user
> that it's the pf on other_host.
>
> Neither does the opaque controller id.
Which is why I tossed the epcipf (external pci pf) port flavour that fits in current model.
But doesn't allow multiple external hosts under same eswitch for those devices which has same pci pf, vf numbers among those hosts. (and it is the case for mlnx).
> Maybe now you understand better why I wanted peer objects :/
>
I wasn't against peer object. But showing netdev of peer object assumed no_smartnic, it also assume other_side is also similar Linux kernel.
Anyways, I make humble request get over the past to move forward. :-)
> > So it is used, and would like to continue to use even if there are multiple PFs
> port (that has same pfnum) under the same eswitch.
> >
> > In an alternative,
> > Currently we have pcipf, pcivf (and pcisf) flavours. May be if we introduce new
> flavour say 'epcipf' to indicate external pci PF/VF/SF ports?
> > There can be better name than epcipf. I just put epcipf to differentiate it.
> > However these ports have same attributes as pcipf, pcivf, pcisf flavours.
>
> I don't think the controllers are a terrible idea. Seems like a fairly reasonable
> extension.
Ok.
> But MLX don't seem to need them. And you have a history of trying to
> make the Linux APIs look like your FW API.
>
Because there are two devlink instances for each PF?
I think for now an epcipf, epcivf flavour would just suffice due to lack of multiple devlink instances.
But in long run it is better to have the controller covering few topologies.
Otherwise we will break the rep naming later when multiple controllers are managed by single eswitch (without notion of controller).
Sometime my text is confusing. :-) so adding example of the thoughts below.
Example: Eswitch side devlink port show for multi-host setup considering the smartnic.
$ devlink port show
pci/0000:00:08.0/0: type eth netdev enp0s8f0 flavour physical
pci/0000:00:08.0/1: type eth netdev enp0s8f0_pf0 flavour pcipf pfnum 0
pci/0000:00:08.0/2: type eth netdev enp0s8f0_c0pf0 flavour epcipf pfnum 0
^^^^^ new port flavour.
pci/0000:00:08.1/0: type eth netdev enp0s8f1 flavour physical
pci/0000:00:08.1/1: type eth netdev enp0s8f1_pf1 flavour pcipf pfnum 1
pci/0000:00:08.1/2: type eth netdev enp0s8f1_c0pf1 flavour epcipf pfnum 1
Here one controller has two pci pfs (0,1}. Eswitch shows that they are external pci ports.
Whenever (not sure when), mlnx converts to single devlink instance, this will continue to work.
It will also work when multiple controller(s) (of external host) ports have same switch_id (for orchestration).
And this doesn't break any backward compatibility for non multihost, non smatnic users.
> Jiri, would you mind chiming in? What's your take?
Will wait for his inputs..
Powered by blists - more mailing lists