[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y4XNSBO+2/YOL9+C@unreal>
Date: Tue, 29 Nov 2022 11:13:44 +0200
From: Leon Romanovsky <leon@...nel.org>
To: Ajit Khaparde <ajit.khaparde@...adcom.com>
Cc: andrew.gospodarek@...adcom.com, davem@...emloft.net,
edumazet@...gle.com, jgg@...pe.ca, kuba@...nel.org,
linux-kernel@...r.kernel.org, linux-rdma@...r.kernel.org,
michael.chan@...adcom.com, netdev@...r.kernel.org,
pabeni@...hat.com, selvin.xavier@...adcom.com
Subject: Re: [PATCH v4 0/6] Add Auxiliary driver support
On Mon, Nov 28, 2022 at 06:01:13PM -0800, Ajit Khaparde wrote:
> On Tue, Nov 22, 2022 at 10:59 PM Leon Romanovsky <leon@...nel.org> wrote:
> >
> > On Tue, Nov 22, 2022 at 07:02:45AM -0800, Ajit Khaparde wrote:
> > > On Wed, Nov 16, 2022 at 5:22 AM Leon Romanovsky <leon@...nel.org> wrote:
> > > >
> > > ::snip::
> > > > > > All PCI management logic and interfaces are needed to be inside eth part
> > > > > > of your driver and only that part should implement SR-IOV config. Once
> > > > > > user enabled SR-IOV, the PCI driver should create auxiliary devices for
> > > > > > each VF. These device will have RDMA capabilities and it will trigger RDMA
> > > > > > driver to bind to them.
> > > > > I agree and once the PF creates the auxiliary devices for the VF, the RoCE
> > > > > Vf indeed get probed and created. But the twist in bnxt_en/bnxt_re
> > > > > design is that
> > > > > the RoCE driver is responsible for making adjustments to the RoCE resources.
> > > >
> > > > You can still do these adjustments by checking type of function that
> > > > called to RDMA .probe. PCI core exposes some functions to help distinguish between
> > > > PF and VFs.
> > > >
> > > > >
> > > > > So once the VF's are created and the bnxt_en driver enables SRIOV adjusts the
> > > > > NIC resources for the VF, and such, it tries to call into the bnxt_re
> > > > > driver for the
> > > > > same purpose.
> > > >
> > > > If I read code correctly, all these resources are for one PCI function.
> > > >
> > > > Something like this:
> > > >
> > > > bnxt_re_probe()
> > > > {
> > > > ...
> > > > if (is_virtfn(p))
> > > > bnxt_re_sriov_config(p);
> > > > ...
> > > > }
> > > I understand what you are suggesting.
> > > But what I want is a way to do this in the context of the PF
> > > preferably before the VFs are probed.
> >
> > I don't understand the last sentence. You call to this sriov_config in
> > bnxt_re driver without any protection from VFs being probed,
>
> Let me elaborate -
> When a user sets num_vfs to a non-zero number, the PCI driver hook
> sriov_configure calls bnxt_sriov_configure(). Once pci_enable_sriov()
> succeeds, bnxt_ulp_sriov_cfg() is issued under bnxt_sriov_configure().
> All this happens under bnxt_en.
> bnxt_ulp_sriov_cfg() ultimately calls into the bnxt_re driver.
> Since bnxt_sriov_configure() is called only for PFs, bnxt_ulp_sriov_cfg()
> is called for PFs only.
>
> Once bnxt_ulp_sriov_cfg() calls into the bnxt_re via the ulp_ops,
> it adjusts the QPs, SRQs, CQs, MRs, GIDs and such.
Once you called to pci_enable_sriov(), PCI core created sysfs entries
and it triggers udev rules and VFs probe. Because you are calling it
in bnxt_sriov_configure(), you will have inherit protection for PF
with PCI lock, but not for VFs.
>
> >
> > > So we are trying to call the
> > > bnxt_re_sriov_config in the context of handling the PF's
> > > sriov_configure implementation. Having the ulp_ops is allowing us to
> > > avoid resource wastage and assumptions in the bnxt_re driver.
> >
> > To which resource wastage are you referring?
> Essentially the PF driver reserves a set of above resources for the PF,
> and divides the remaining resources among the VFs.
> If the calculation is based on sriov_totalvfs instead of sriov_numvfs,
> there can be a difference in the resources provisioned for a VF.
> And that is because a user may create a subset of VFs instead of the
> total VFs allowed in the PCI SR-IOV capability register.
> I was referring to the resource wastage in that deployment scenario.
It is ok, set all needed limits in bnxt_en. You don't need to call to
bnxt_re for that.
>
> Thanks
> Ajit
>
> >
> > There are no differences if same limits will be in bnxt_en driver when
> > RDMA bnxt device is created or in bnxt_re which will be called once RDMA
> > device is created.
> >
> > Thanks
> >
> > >
> > > ::snip::
> >
> >
Powered by blists - more mailing lists