[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aae50d11-73ed-b5cb-a968-2d8d03465ea9@redhat.com>
Date: Tue, 29 May 2018 10:29:49 -0400
From: Don Dutile <ddutile@...hat.com>
To: Bjorn Helgaas <helgaas@...nel.org>
Cc: Jakub Kicinski <jakub.kicinski@...ronome.com>,
Bjorn Helgaas <bhelgaas@...gle.com>, linux-pci@...r.kernel.org,
netdev@...r.kernel.org, Sathya Perla <sathya.perla@...adcom.com>,
Felix Manlunas <felix.manlunas@...iumnetworks.com>,
alexander.duyck@...il.com, john.fastabend@...il.com,
Jacob Keller <jacob.e.keller@...el.com>,
oss-drivers@...ronome.com, Christoph Hellwig <hch@...radead.org>
Subject: Re: [PATCH] PCI: allow drivers to limit the number of VFs to 0
On 05/25/2018 04:46 PM, Bjorn Helgaas wrote:
> On Fri, May 25, 2018 at 03:27:52PM -0400, Don Dutile wrote:
>> On 05/25/2018 10:02 AM, Bjorn Helgaas wrote:
>>> On Thu, May 24, 2018 at 06:20:15PM -0700, Jakub Kicinski wrote:
>>>> Hi Bjorn!
>>>>
>>>> On Thu, 24 May 2018 18:57:48 -0500, Bjorn Helgaas wrote:
>>>>> On Mon, Apr 02, 2018 at 03:46:52PM -0700, Jakub Kicinski wrote:
>>>>>> Some user space depends on enabling sriov_totalvfs number of VFs
>>>>>> to not fail, e.g.:
>>>>>>
>>>>>> $ cat .../sriov_totalvfs > .../sriov_numvfs
>>>>>>
>>>>>> For devices which VF support depends on loaded FW we have the
>>>>>> pci_sriov_{g,s}et_totalvfs() API. However, this API uses 0 as
>>>>>> a special "unset" value, meaning drivers can't limit sriov_totalvfs
>>>>>> to 0. Remove the special values completely and simply initialize
>>>>>> driver_max_VFs to total_VFs. Then always use driver_max_VFs.
>>>>>> Add a helper for drivers to reset the VF limit back to total.
>>>>>
>>>>> I still can't really make sense out of the changelog.
>>>>>
>>>>> I think part of the reason it's confusing is because there are two
>>>>> things going on:
>>>>>
>>>>> 1) You want this:
>>>>> pci_sriov_set_totalvfs(dev, 0);
>>>>> x = pci_sriov_get_totalvfs(dev)
>>>>>
>>>>> to return 0 instead of total_VFs. That seems to connect with
>>>>> your subject line. It means "sriov_totalvfs" in sysfs could be
>>>>> 0, but I don't know how that is useful (I'm sure it is; just
>>>>> educate me :))
>>>>
>>>> Let me just quote the bug report that got filed on our internal bug
>>>> tracker :)
>>>>
>>>> When testing Juju Openstack with Ubuntu 18.04, enabling SR-IOV causes
>>>> errors because Juju gets the sriov_totalvfs for SR-IOV-capable device
>>>> then tries to set that as the sriov_numvfs parameter.
>>>>
>>>> For SR-IOV incapable FW, the sriov_totalvfs parameter should be 0,
>>>> but it's set to max. When FW is switched to flower*, the correct
>>>> sriov_totalvfs value is presented.
>>>>
>>>> * flower is a project name
>>>
>>> From the point of view of the PCI core (which knows nothing about
>>> device firmware and relies on the architected config space described
>>> by the PCIe spec), this sounds like an erratum: with some firmware
>>> installed, the device is not capable of SR-IOV, but still advertises
>>> an SR-IOV capability with "TotalVFs > 0".
>>>
>>> Regardless of whether that's an erratum, we do allow PF drivers to use
>>> pci_sriov_set_totalvfs() to limit the number of VFs that may be
>>> enabled by writing to the PF's "sriov_numvfs" sysfs file.
>>>
>> +1.
>>
>>> But the current implementation does not allow a PF driver to limit VFs
>>> to 0, and that does seem nonsensical.
>>>
>> Well, not really -- claiming to support VFs, and then wanting it to be 0...
>> I could certainly argue is non-sensical.
>> From a sw perspective, sure, see if we can set VFs to 0 (and reset to another value later).
>>
>> /me wishes that implementers would follow the architecture vs torquing it into strange shapes.
>>
>>>> My understanding is OpenStack uses sriov_totalvfs to determine how many
>>>> VFs can be enabled, looks like this is the code:
>>>>
>>>> http://git.openstack.org/cgit/openstack/charm-neutron-openvswitch/tree/hooks/neutron_ovs_utils.py#n464
>>>>
>>>>> 2) You're adding the pci_sriov_reset_totalvfs() interface. I'm not
>>>>> sure what you intend for this. Is *every* driver supposed to
>>>>> call it in .remove()? Could/should this be done in the core
>>>>> somehow instead of depending on every driver?
>>>>
>>>> Good question, I was just thinking yesterday we may want to call it
>>>> from the core, but I don't think it's strictly necessary nor always
>>>> sufficient (we may reload FW without re-probing).
>>>>
>>>> We have a device which supports different number of VFs based on the FW
>>>> loaded. Some legacy FWs does not inform the driver how many VFs it can
>>>> support, because it supports max. So the flow in our driver is this:
>>>>
>>>> load_fw(dev);
>>>> ...
>>>> max_vfs = ask_fw_for_max_vfs(dev);
>>>> if (max_vfs >= 0)
>>>> return pci_sriov_set_totalvfs(dev, max_vfs);
>>>> else /* FW didn't tell us, assume max */
>>>> return pci_sriov_reset_totalvfs(dev);
>>>>
>>>> We also reset the max on device remove, but that's not strictly
>>>> necessary.
>>>>
>>>> Other users of pci_sriov_set_totalvfs() always know the value to set
>>>> the total to (either always get it from FW or it's a constant).
>>>>
>>>> If you prefer we can work out the correct max for those legacy cases in
>>>> the driver as well, although it seemed cleaner to just ask the core,
>>>> since it already has total_VFs value handy :)
>>>>
>>>>> I'm also having a hard time connecting your user-space command example
>>>>> with the rest of this. Maybe it will make more sense to me tomorrow
>>>>> after some coffee.
>>>>
>>>> OpenStack assumes it will always be able to set sriov_numvfs to
>>>> sriov_totalvfs, see this 'if':
>>>>
>>>> http://git.openstack.org/cgit/openstack/charm-neutron-openvswitch/tree/hooks/neutron_ovs_utils.py#n512
>>>
>>> Thanks for educating me. I think there are two issues here that we
>>> can separate. I extracted the patch below for the first.
>>>
>>> The second is the question of resetting driver_max_VFs. I think we
>>> currently have a general issue in the core:
>>>
>>> - load PF driver 1
>>> - driver calls pci_sriov_set_totalvfs() to reduce driver_max_VFs
>>> - unload PF driver 1
>>> - load PF driver 2
>>>
>>> Now driver_max_VFs is still stuck at the lower value set by driver 1.
>>> I don't think that's the way this should work.
>>>
>>> I guess this is partly a consequence of setting driver_max_VFs in
>>> sriov_init(), which is called before driver attach and should only
>> um, if it's at sriov_init() how is max changed by a PF driver?
>> or am I missing something subtle (a new sysfs param) as to what is being changed?
>
> sriov_init() basically just sets the default driver_max_VFs to Total_VFs.
>
> If the PF driver later calls pci_sriov_set_totalvfs(), it can reduce
> driver_max_VFs.
>
> My concern is that there's nothing that resets driver_max_VFs back to
> Total_VFs if we unload and reload the PF driver.
>
ok, gotcha.
any complication of this non-arch quirk. :-/
>>> depend on hardware characteristics, so it is related to the patch
>>> below. But I think we should fix it in general, not just for
>>> netronome.
>>>
>>>
>>> commit 4a338bc6f94b9ad824ac944f5dfc249d6838719c
>>> Author: Jakub Kicinski <jakub.kicinski@...ronome.com>
>>> Date: Fri May 25 08:18:34 2018 -0500
>>>
>>> PCI/IOV: Allow PF drivers to limit total_VFs to 0
>>> Some SR-IOV PF drivers implement .sriov_configure(), which allows
>>> user-space to enable VFs by writing the desired number of VFs to the sysfs
>>> "sriov_numvfs" file (see sriov_numvfs_store()).
>>> The PCI core limits the number of VFs to the TotalVFs advertised by the
>>> device in its SR-IOV capability. The PF driver can limit the number of VFs
>>> to even fewer (it may have pre-allocated data structures or knowledge of
>>> device limitations) by calling pci_sriov_set_totalvfs(), but previously it
>>> could not limit the VFs to 0.
>>> Change pci_sriov_get_totalvfs() so it always respects the VF limit imposed
>>> by the PF driver, even if the limit is 0.
>>> This sequence:
>>> pci_sriov_set_totalvfs(dev, 0);
>>> x = pci_sriov_get_totalvfs(dev);
>>> previously set "x" to TotalVFs from the SR-IOV capability. Now it will set
>>> "x" to 0.
>>> Signed-off-by: Jakub Kicinski <jakub.kicinski@...ronome.com>
>>> Signed-off-by: Bjorn Helgaas <bhelgaas@...gle.com>
>>>
>>> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
>>> index 192b82898a38..d0d73dbbd5ca 100644
>>> --- a/drivers/pci/iov.c
>>> +++ b/drivers/pci/iov.c
>>> @@ -469,6 +469,7 @@ static int sriov_init(struct pci_dev *dev, int pos)
>>> iov->nres = nres;
>>> iov->ctrl = ctrl;
>>> iov->total_VFs = total;
>>> + iov->driver_max_VFs = total;
>>> pci_read_config_word(dev, pos + PCI_SRIOV_VF_DID, &iov->vf_device);
>>> iov->pgsz = pgsz;
>>> iov->self = dev;
>>> @@ -827,10 +828,7 @@ int pci_sriov_get_totalvfs(struct pci_dev *dev)
>>> if (!dev->is_physfn)
>>> return 0;
>>> - if (dev->sriov->driver_max_VFs)
>>> - return dev->sriov->driver_max_VFs;
>>> -
>>> - return dev->sriov->total_VFs;
>>> + return dev->sriov->driver_max_VFs;
>>> }
>>> EXPORT_SYMBOL_GPL(pci_sriov_get_totalvfs);
>>>
>>
Powered by blists - more mailing lists