[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BY5PR12MB43226AFA5002D7086597AC62DC4E9@BY5PR12MB4322.namprd12.prod.outlook.com>
Date: Wed, 14 Apr 2021 05:27:23 +0000
From: Parav Pandit <parav@...dia.com>
To: "Saleem, Shiraz" <shiraz.saleem@...el.com>,
Jason Gunthorpe <jgg@...dia.com>, Jiri Pirko <jiri@...dia.com>
CC: "dledford@...hat.com" <dledford@...hat.com>,
"kuba@...nel.org" <kuba@...nel.org>,
"davem@...emloft.net" <davem@...emloft.net>,
"linux-rdma@...r.kernel.org" <linux-rdma@...r.kernel.org>,
"Lacombe, John S" <john.s.lacombe@...el.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"Ertman, David M" <david.m.ertman@...el.com>,
"Nguyen, Anthony L" <anthony.l.nguyen@...el.com>,
"Williams, Dan J" <dan.j.williams@...el.com>,
"Hefty, Sean" <sean.hefty@...el.com>,
"Keller, Jacob E" <jacob.e.keller@...el.com>
Subject: RE: [PATCH v4 05/23] ice: Add devlink params support
+ Jiri.
> From: Saleem, Shiraz <shiraz.saleem@...el.com>
> Sent: Wednesday, April 14, 2021 5:51 AM
>
> > Subject: RE: [PATCH v4 05/23] ice: Add devlink params support
> >
> >
> >
> > > From: Saleem, Shiraz <shiraz.saleem@...el.com>
> > > Sent: Tuesday, April 13, 2021 8:11 PM
> > [..]
> >
> > > > > > Parav is talking about generic ways to customize the aux
> > > > > > devices created and that would seem to serve the same function as
> this.
> > > > >
> > > > > Is there an RFC or something posted for us to look at?
> > > > I do not have polished RFC content ready yet.
> > > > But coping the full config sequence snippet from the internal
> > > > draft (changed for ice
> > > > example) here as I like to discuss with you in this context.
> > >
> > > Thanks Parav! Some comments below.
> > >
> > > >
> > > > # (1) show auxiliary device types supported by a given devlink device.
> > > > # applies to pci pf,vf,sf. (in general at devlink instance).
> > > > $ devlink dev auxdev show pci/0000:06.00.0
> > > > pci/0000:06.00.0:
> > > > current:
> > > > roce eth
> > > > new:
> > > > supported:
> > > > roce eth iwarp
> > > >
> > > > # (2) enable iwarp and ethernet type of aux devices and disable roce.
> > > > $ devlink dev auxdev set pci/0000:06:00.0 roce off iwarp on
> > > >
> > > > # (3) now see which aux devices will be enable on next reload.
> > > > $ devlink dev auxdev show pci/0000:06:00.0
> > > > pci/0000:06:00.0:
> > > > current:
> > > > roce eth
> > > > new:
> > > > eth iwarp
> > > > supported:
> > > > roce eth iwarp
> > > >
> > > > # (4) now reload the device and see which aux devices are created.
> > > > At this point driver undergoes reconfig for removal of roce and
> > > > adding
> > > iwarp.
> > > > $ devlink reload pci/0000:06:00.0
> > >
> > > I see this is modeled like devlink resource.
> > >
> > > Do we really to need a PCI driver re-init to switch the type of the
> > > auxdev hanging off the PCI dev?
> > >
> > I don't see a need to re-init the whole PCI driver. Since only aux
> > device config is changed only that piece to get reloaded.
>
> But that is what mlx5 and other implementations does on reload no? i.e. a
> PCI driver reinit.
Currently yes, reload does PCI re-init.
However I am not seeing the value of reload if no config (param, resource, auxdev) is changed.
> I can see an ice implementation of reload morphing to similar over time to
> support a new config that requires a true reinit of PCI driver entities.
>
Sure.
> >
> > > Why not just allow the setting to apply dynamically during a 'set'
> > > itself with an unplug/plug of the auxdev with correct type.
> > >
> > This suggestion came up in the internal discussion too.
> > However such task needs to synchronize with devlink reload command and
> > also with driver remove() sequence.
> > So locking wise and depending on amount of config change, it is close
> > to what reload will do.
>
> Holding this mutex across the auxiliary device unplug/plug in "set" wont cut
> it?
> https://elixir.bootlin.com/linux/v5.12-
> rc7/source/drivers/net/ethernet/mellanox/mlx5/core/main.c#L1304
>
Currently devlink reload for mlx5 is source of lockdep assert, use after free access and a deadlock in net ns. :-(
Multiple of us (Leon, Saeed, Moshe) working on it resolve it.
So I want to stay away from intf_mutex for now.
> > For example other resource config or other params setting also to take
> effect.
> > So to avoid defining multiple config sequence, doing as part of
> > already existing devlink reload, it brings simple sequence to user.
> >
> > For example,
> > 1. enable/disable desired aux devices
> > 2. configure device resources
> > 3. set some device params
> > 4. do devlink reload and apply settings done in #1 to #3
>
> Sure. But a user might also just want to operate on just an auxiliary device
> configuration change. As in #1.
> And he ends up having everything hanging off the PF to get blown out,
> including potentially the VFs. That feels like too big a hammer.
This is certainly not desired.
If we want aux device enable/disable to take effect when its done without reload than above flow should be redefined as,
1. configure device resources (optional)
2. set some device params (optional)
3. enable/disable desired aux devices
Step-3 needs to apply the settings of (1) and (2) without user doing devlink reload.
devlink core doesn't know on step #3, that reload_down() and reload_up() to be done.
So driver internally needs to implement reload_down(), up() on callback of #3.
This builds parallel framework to devlink reload.
Jiri,
What do you think of it?
Powered by blists - more mailing lists