lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 28 Feb 2018 16:32:20 +0100
From:   Auger Eric <eric.auger@...hat.com>
To:     Shameerali Kolothum Thodi <shameerali.kolothum.thodi@...wei.com>,
        Alex Williamson <alex.williamson@...hat.com>
Cc:     "pmorel@...ux.vnet.ibm.com" <pmorel@...ux.vnet.ibm.com>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Linuxarm <linuxarm@...wei.com>,
        John Garry <john.garry@...wei.com>,
        "xuwei (O)" <xuwei5@...wei.com>,
        Robin Murphy <robin.murphy@....com>
Subject: Re: [PATCH v4 4/6] vfio/type1: check dma map request is within a
 valid iova range

Hi Shameer,

On 28/02/18 14:39, Shameerali Kolothum Thodi wrote:
> Hi Eric,
> 
>> -----Original Message-----
>> From: Auger Eric [mailto:eric.auger@...hat.com]
>> Sent: Wednesday, February 28, 2018 11:53 AM
>> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@...wei.com>;
>> Alex Williamson <alex.williamson@...hat.com>
>> Cc: pmorel@...ux.vnet.ibm.com; kvm@...r.kernel.org; linux-
>> kernel@...r.kernel.org; Linuxarm <linuxarm@...wei.com>; John Garry
>> <john.garry@...wei.com>; xuwei (O) <xuwei5@...wei.com>; Robin Murphy
>> <robin.murphy@....com>
>> Subject: Re: [PATCH v4 4/6] vfio/type1: check dma map request is within a valid
>> iova range
>>
>> Hi Shameer,
>>
>> On 28/02/18 10:25, Shameerali Kolothum Thodi wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Auger Eric [mailto:eric.auger@...hat.com]
>>>> Sent: Wednesday, February 28, 2018 9:02 AM
>>>> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@...wei.com>;
>>>> Alex Williamson <alex.williamson@...hat.com>
>>>> Cc: pmorel@...ux.vnet.ibm.com; kvm@...r.kernel.org; linux-
>>>> kernel@...r.kernel.org; Linuxarm <linuxarm@...wei.com>; John Garry
>>>> <john.garry@...wei.com>; xuwei (O) <xuwei5@...wei.com>; Robin
>> Murphy
>>>> <robin.murphy@....com>
>>>> Subject: Re: [PATCH v4 4/6] vfio/type1: check dma map request is within a
>> valid
>>>> iova range
>>>>
>>>> Hi Shameer,
>>>>
>>>> On 27/02/18 10:57, Shameerali Kolothum Thodi wrote:
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Auger Eric [mailto:eric.auger@...hat.com]
>>>>>> Sent: Tuesday, February 27, 2018 8:27 AM
>>>>>> To: Alex Williamson <alex.williamson@...hat.com>
>>>>>> Cc: Shameerali Kolothum Thodi
>> <shameerali.kolothum.thodi@...wei.com>;
>>>>>> pmorel@...ux.vnet.ibm.com; kvm@...r.kernel.org; linux-
>>>>>> kernel@...r.kernel.org; Linuxarm <linuxarm@...wei.com>; John Garry
>>>>>> <john.garry@...wei.com>; xuwei (O) <xuwei5@...wei.com>; Robin
>>>> Murphy
>>>>>> <robin.murphy@....com>
>>>>>> Subject: Re: [PATCH v4 4/6] vfio/type1: check dma map request is within a
>>>> valid
>>>>>> iova range
>>>>>>
>>>>>> Hi,
>>>>>> On 27/02/18 00:13, Alex Williamson wrote:
>>>>>>> On Mon, 26 Feb 2018 23:05:43 +0100
>>>>>>> Auger Eric <eric.auger@...hat.com> wrote:
>>>>>>>
>>>>>>>> Hi Shameer,
>>>>>>>>
>>>>>>>> [Adding Robin in CC]
>>>>>>>> On 21/02/18 13:22, Shameer Kolothum wrote:
>>>>>>>>> This checks and rejects any dma map request outside valid iova
>>>>>>>>> range.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Shameer Kolothum
>>>>>> <shameerali.kolothum.thodi@...wei.com>
>>>>>>>>> ---
>>>>>>>>>  drivers/vfio/vfio_iommu_type1.c | 22 ++++++++++++++++++++++
>>>>>>>>>  1 file changed, 22 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/vfio/vfio_iommu_type1.c
>>>>>> b/drivers/vfio/vfio_iommu_type1.c
>>>>>>>>> index a80884e..3049393 100644
>>>>>>>>> --- a/drivers/vfio/vfio_iommu_type1.c
>>>>>>>>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>>>>>>>> @@ -970,6 +970,23 @@ static int vfio_pin_map_dma(struct
>> vfio_iommu
>>>>>> *iommu, struct vfio_dma *dma,
>>>>>>>>>  	return ret;
>>>>>>>>>  }
>>>>>>>>>
>>>>>>>>> +/*
>>>>>>>>> + * Check dma map request is within a valid iova range
>>>>>>>>> + */
>>>>>>>>> +static bool vfio_iommu_iova_dma_valid(struct vfio_iommu *iommu,
>>>>>>>>> +				dma_addr_t start, dma_addr_t end)
>>>>>>>>> +{
>>>>>>>>> +	struct list_head *iova = &iommu->iova_list;
>>>>>>>>> +	struct vfio_iova *node;
>>>>>>>>> +
>>>>>>>>> +	list_for_each_entry(node, iova, list) {
>>>>>>>>> +		if ((start >= node->start) && (end <= node->end))
>>>>>>>>> +			return true;
>>>>>>>> I am now confused by the fact this change will prevent existing QEMU
>>>>>>>> from working with this series on some platforms. For instance QEMU
>> virt
>>>>>>>> machine GPA space collides with Seattle PCI host bridge windows. On
>>>> ARM
>>>>>>>> the smmu and smmuv3 drivers report the PCI host bridge windows as
>>>>>>>> reserved regions which does not seem to be the case on other
>> platforms.
>>>>>>>> The change happened in commit
>>>>>> 273df9635385b2156851c7ee49f40658d7bcb29d
>>>>>>>> (iommu/dma: Make PCI window reservation generic).
>>>>>>>>
>>>>>>>> For background, we already discussed the topic after LPC 2016. See
>>>>>>>> https://www.spinics.net/lists/kernel/msg2379607.html.
>>>>>>>>
>>>>>>>> So is it the right choice to expose PCI host bridge windows as reserved
>>>>>>>> regions? If yes shouldn't we make a difference between those and MSI
>>>>>>>> windows in this series and do not reject any user space DMA_MAP
>>>> attempt
>>>>>>>> within PCI host bridge windows.
>>>>>>>
>>>>>>> If the QEMU machine GPA collides with a reserved region today, then
>>>>>>> either:
>>>>>>>
>>>>>>> a) The mapping through the IOMMU works and the reserved region is
>>>> wrong
>>>>>>>
>>>>>>> or
>>>>>>>
>>>>>>> b) The mapping doesn't actually work, QEMU is at risk of data loss by
>>>>>>> being told that it worked, and we're justified in changing that
>>>>>>> behavior.
>>>>>>>
>>>>>>> Without knowing the specifics of SMMU, it doesn't particularly make
>>>>>>> sense to me to mark the entire PCI hierarchy MMIO range as reserved,
>>>>>>> unless perhaps the IOMMU is incapable of translating those IOVAs.
>>>>>> to me the limitation does not come from the smmu itself, which is a
>>>>>> separate HW block sitting between the root complex and the
>> interconnect.
>>>>>> If ACS is not enforced by the PCIe subsystem, the transaction will never
>>>>>> reach the IOMMU.
>>>>>
>>>>> True. And we do have one such platform where ACS is not enforced but
>>>>> reserving the regions and possibly creating holes while launching VM will
>>>>> make it secure. But I do wonder how we will solve the device grouping
>>>>> in such cases.
>>>>>
>>>>> The Seattle PCI host bridge windows case you mentioned has any pci quirk
>>>>> to claim that they support ACS?
>>>> No there is none to my knowledge. I am applying Alex' not upstream ACS
>>>> overwrite patch.
>>>
>>> Ok. But isn't that patch actually applicable to cases where ACS is really
>> supported
>>> by hardware but the capability is not available?
>>
>> My understanding is normally yes. If you apply the patch whereas the HW
>> practically does not support ACS, then you fool the kernel pretending
>> there is isolation whereas there is not. I don't know the exact
>> capability of the HW on AMD Seattle and effectively I should have cared
>> about it much earlier and if the HW capability were supported and not
>> properly exposed we should have implemented a clean quirk for this platform.
> 
> Ok. Thanks for the details.
>  
>>
>>   I am just trying to see whether
>>> the argument that we should allow DMA MAP requests for this(non-ACS case)
>>> even if the Guest GPA conflict with reserved region holds good. The fact that
>> may
>>> be it was working before is that the Guest never actually allocated any GPA
>> from
>>> the reserved region or maybe I am missing something here.
>>
>> If my understanding is correct, in ideal world we would report the PCI
>> host bridge window as reserved only in case ACS is not supported. If you
>> apply the patch and overrides the ACS, then the DMA_MAP  would be
>> allowed. In case the HW does not support ACS, then you could face
>> situations where one EP tries to access GPA that never reaches the IOMMU
>> (because it corresponds to the BAR of another downstream EP). Same can
>> happen at the moment.
> 
> Yes, this is my understanding too.
> 
> Just wondering the below changes to the iommu_dma_get_resv_regions() is
> good enough to take care this issue or not.

This looks sensible to me. The only question I have is can this ACS
computation result be altered later on for some reason.

Otherwise I confirm this fixes the reported reserved regions (no more
PCI host bridge windows) and assignment failure on Cavium CN88xx. Cavium
CN88xx assignment would otherwise be affected too.

Thanks

Eric
> 
> Thanks,
> Shameer
> 
> -- >8 --
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index f05f3cf..b6e89d5 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -34,6 +34,8 @@
>  
>  #define IOMMU_MAPPING_ERROR	0
>  
> +#define REQ_ACS_FLAGS   (PCI_ACS_SV | PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_UF)
> +
>  struct iommu_dma_msi_page {
>  	struct list_head	list;
>  	dma_addr_t		iova;
> @@ -183,6 +185,9 @@ void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list)
>  	if (!dev_is_pci(dev))
>  		return;
>  
> +	if (pci_acs_path_enabled(to_pci_dev(dev), NULL, REQ_ACS_FLAGS))
> +		return;
> +
>  	bridge = pci_find_host_bridge(to_pci_dev(dev)->bus);
>  	resource_list_for_each_entry(window, &bridge->windows) {
>  		struct iommu_resv_region *region;
> ---8--
> 
> 
>>
>> Eric
>>>
>>> Thanks,
>>> Shameer
>>>
>>>> Thanks
>>>>
>>>> Eric
>>>>>
>>>>>> In the case of such overlap, shouldn't we just warn the end-user that
>>>>>> this situation is dangerous instead of forbidding the use case which
>>>>>> worked "in most cases" until now.
>>>>>
>>>>> Yes, may be something similar to the allow_unsafe_interrupts case, if
>>>>> that is acceptable.
>>>>>
>>>>> Thanks,
>>>>> Shameer
>>>>>
>>>>>>> Are we trying to prevent untranslated p2p with this reserved range?
>>>>>>> That's not necessarily a terrible idea, but it seems that doing it for
>>>>>>> that purpose would need to be a lot smarter, taking into account ACS
>>>>>>> and precisely selecting ranges within the peer address space that would
>>>>>>> be untranslated.  Perhaps only populated MMIO within non-ACS
>>>>>>> hierarchies.  Thanks,
>>>>>>
>>>>>> Indeed taking into account the ACS capability would refine the
>>>>>> situations where a risk exists.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> Eric
>>>>>>>
>>>>>>> Alex
>>>>>>>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ