lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 28 Oct 2015 11:32:14 -0700
From:	Alexander Duyck <alexander.duyck@...il.com>
To:	Bjorn Helgaas <helgaas@...nel.org>,
	Alexander Duyck <aduyck@...antis.com>
Cc:	bhelgaas@...gle.com, linux-pci@...r.kernel.org,
	linux-kernel@...r.kernel.org,
	Ben Shelton <benjamin.h.shelton@...el.com>
Subject: Re: [PATCH 2/5] iov: Reset resources to 0 if totalVFs increases after
 enabling ARI

On 10/28/2015 09:37 AM, Bjorn Helgaas wrote:
> Hi Alex,
>
> On Tue, Oct 27, 2015 at 01:52:21PM -0700, Alexander Duyck wrote:
>> This patch forces us to reallocate VF BARs if the totalVFs value has
>> increased after enabling ARI.  This normally shouldn't occur, however I
>> have seen some non-spec devices that shift between 7 and some value greater
>> than 7 based on the ARI value and we want to avoid triggering any issues
>> with such devices.
>
> Can you include specifics about the devices?  The value "7" is pretty
> specific, so if we're going to include that level of detail, we should
> have the actual device info to go with it.

I referenced 7 as that is the largest number of VFs a single function 
can support assuming a single function without ARI and without the 
ability to handle Type 1 configuration requests.  The Intel fm10k driver 
has logic in it that does a check for ARI and if it is supported it 
reports via sysfs a totalVFs of 64, otherwise it limits the totalVFs 
reported to 7.  However, I don't believe it exposes the limitation via 
the configuration space.

> I guess the problem is:
>
>    - Device supports 7 TotalVFs with ARI disabled, >7 with ARI enabled
>    - Firmware leaves ARI disabled in SRIOV_CTRL
>    - Firmware computes size based on 7 VFs
>    - Firmware allocates space and programs BARs for 7 VFs
>    - Linux enables ARI, reads >7 TotalVFs
>    - Linux computes size based on >7 VFs
>    - Increased size may overlap other resources
>
> Right?

Right.  More than likely what will happen is that you will see overlap 
of the device on itself if it has multiple base address registers 
assigned to the VFs.

>> Fixes: 3aa71da412fe ("PCI: Enable SR-IOV ARI Capable Hierarchy before reading TotalVFs")
>> Signed-off-by: Alexander Duyck <aduyck@...antis.com>
>> ---
>>   drivers/pci/iov.c |   11 ++++++++++-
>>   1 file changed, 10 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
>> index 099050d78a39..238950412de0 100644
>> --- a/drivers/pci/iov.c
>> +++ b/drivers/pci/iov.c
>> @@ -393,7 +393,7 @@ static int sriov_init(struct pci_dev *dev, int pos)
>>   	int rc;
>>   	int nres;
>>   	u32 pgsz;
>> -	u16 ctrl, total;
>> +	u16 ctrl, total, orig_total;
>>   	struct pci_sriov *iov;
>>   	struct resource *res;
>>   	struct pci_dev *pdev;
>> @@ -402,6 +402,7 @@ static int sriov_init(struct pci_dev *dev, int pos)
>>   	    pci_pcie_type(dev) != PCI_EXP_TYPE_ENDPOINT)
>>   		return -ENODEV;
>>
>> +	pci_read_config_word(dev, pos + PCI_SRIOV_TOTAL_VF, &orig_total);
>>   	pci_read_config_word(dev, pos + PCI_SRIOV_CTRL, &ctrl);
>>   	if (ctrl & PCI_SRIOV_CTRL_VFE) {
>>   		pci_write_config_word(dev, pos + PCI_SRIOV_CTRL, 0);
>> @@ -450,6 +451,14 @@ found:
>>   		}
>>   		iov->barsz[i] = resource_size(res);
>>   		res->end = res->start + resource_size(res) * total - 1;
>> +
>> +		/* force reallocation of BARs if total VFs increased */
>> +		if (orig_total < total) {
>> +			res->flags |= IORESOURCE_UNSET;
>> +			res->end -= res->start;
>> +			res->start = 0;
>> +		}
>
> Two thoughts here:
>
> 1) Even if the required space increased, it's possible that firmware
> placed the BAR somewhere where the extra space is available.  In that
> case, this forces reallocation unnecessarily.

I'd say it is possible, but not likely.  From past experience I have 
seen BIOSes do some very dumb things when it comes to SR-IOV, assuming 
they even support it.

In addition many of the VF devices out there support more than one base 
address register per function.  The Intel NICs for example have one for 
device registers and one for MSI-X registers.  And most BIOSes usually 
pack one right after the other from what I have seen.  So while there 
may be more space there what usually happens is that the MSI-X region 
will have to be relocated in order to make room for expanding the other 
base address register.

My last bit on all this is that VFs are meant to be assigned into 
guests.  I would argue that for the sake of security we are much better 
off invalidating the VF base address registers and forcing a 
reallocation if there is even a risk of the VF base address register 
space overlapping with some other piece of host memory.  We don't want 
to risk possibly exposing any bits of the host that we didn't intend on.

> 2) This *feels* like something the PCI core should be doing anyway,
> even without any help here.  Shouldn't we fail in pci_claim_resource()
> and set IORESOURCE_UNSET there?
>
> OK, and a third: re-reading TotalVF is (I think) completely
> unnecessary per spec, so if we're going to do it we should probably
> have a one-line comment about why the code is doing something that
> appears unnecessary, and really should have a concrete example device
> in the changelog.

I'm perfectly fine with us adding a comment about this being needed for 
out-of-spec devices on BIOSes that may not fully support SR-IOV. 
However, I wasn't the one that had the device that needed the workaround 
for totalVFs and so I cannot provide a concrete example myself.

I've added Ben to the Cc as he was the author for the original patch 
that introduced this possible issue.  Perhaps he can provide us with 
more details on the hardware that needs this.

- Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ