linux-kernel - Re: [RFC] vfio/type1: handle case where IOMMU does not support PAGE

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <56310341.7010307@linaro.org>
Date:	Wed, 28 Oct 2015 18:17:53 +0100
From:	Eric Auger <eric.auger@...aro.org>
To:	Will Deacon <will.deacon@....com>,
	Alex Williamson <alex.williamson@...hat.com>
Cc:	eric.auger@...com, linux-arm-kernel@...ts.infradead.org,
	kvmarm@...ts.cs.columbia.edu, kvm@...r.kernel.org,
	suravee.suthikulpanit@....com, christoffer.dall@...aro.org,
	linux-kernel@...r.kernel.org, patches@...aro.org
Subject: Re: [RFC] vfio/type1: handle case where IOMMU does not support
 PAGE_SIZE size

Hi Will,
On 10/28/2015 06:14 PM, Will Deacon wrote:
> On Wed, Oct 28, 2015 at 10:27:28AM -0600, Alex Williamson wrote:
>> On Wed, 2015-10-28 at 13:12 +0000, Eric Auger wrote:
>>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>>> index 57d8c37..13fb974 100644
>>> --- a/drivers/vfio/vfio_iommu_type1.c
>>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>> @@ -403,7 +403,7 @@ static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
>>>  static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu)
>>>  {
>>>  	struct vfio_domain *domain;
>>> -	unsigned long bitmap = PAGE_MASK;
>>> +	unsigned long bitmap = ULONG_MAX;
>>
>> Isn't this and removing the WARN_ON()s the only real change in this
>> patch?  The rest looks like conversion to use IS_ALIGNED and the
>> following test, that I don't really understand...
>>
>>>  
>>>  	mutex_lock(&iommu->lock);
>>>  	list_for_each_entry(domain, &iommu->domain_list, next)
>>> @@ -416,20 +416,18 @@ static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu)
>>>  static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
>>>  			     struct vfio_iommu_type1_dma_unmap *unmap)
>>>  {
>>> -	uint64_t mask;
>>>  	struct vfio_dma *dma;
>>>  	size_t unmapped = 0;
>>>  	int ret = 0;
>>> +	unsigned int min_pagesz = __ffs(vfio_pgsize_bitmap(iommu));
>>> +	unsigned int requested_alignment = (min_pagesz < PAGE_SIZE) ?
>>> +						PAGE_SIZE : min_pagesz;
>>
>> This one.  If we're going to support sub-PAGE_SIZE mappings, why do we
>> care to cap alignment at PAGE_SIZE?
> 
> Eric can clarify, but I think the intention here is to have VFIO continue
> doing things in PAGE_SIZE chunks precisely so that we don't have to rework
> all of the pinning code etc.
That's my intention indeed ;-)

Thanks

Eric
 The IOMMU API can then deal with the smaller
> page size.

> 
>>> -	mask = ((uint64_t)1 << __ffs(vfio_pgsize_bitmap(iommu))) - 1;
>>> -
>>> -	if (unmap->iova & mask)
>>> +	if (!IS_ALIGNED(unmap->iova, requested_alignment))
>>>  		return -EINVAL;
>>> -	if (!unmap->size || unmap->size & mask)
>>> +	if (!unmap->size || !IS_ALIGNED(unmap->size, requested_alignment))
>>>  		return -EINVAL;
>>>  
>>> -	WARN_ON(mask & PAGE_MASK);
>>> -
>>>  	mutex_lock(&iommu->lock);
>>>  
>>>  	/*
>>> @@ -553,25 +551,24 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
>>>  	size_t size = map->size;
>>>  	long npage;
>>>  	int ret = 0, prot = 0;
>>> -	uint64_t mask;
>>>  	struct vfio_dma *dma;
>>>  	unsigned long pfn;
>>> +	unsigned int min_pagesz = __ffs(vfio_pgsize_bitmap(iommu));
>>> +	unsigned int requested_alignment = (min_pagesz < PAGE_SIZE) ?
>>> +						PAGE_SIZE : min_pagesz;
>>>  
>>>  	/* Verify that none of our __u64 fields overflow */
>>>  	if (map->size != size || map->vaddr != vaddr || map->iova != iova)
>>>  		return -EINVAL;
>>>  
>>> -	mask = ((uint64_t)1 << __ffs(vfio_pgsize_bitmap(iommu))) - 1;
>>> -
>>> -	WARN_ON(mask & PAGE_MASK);
>>> -
>>>  	/* READ/WRITE from device perspective */
>>>  	if (map->flags & VFIO_DMA_MAP_FLAG_WRITE)
>>>  		prot |= IOMMU_WRITE;
>>>  	if (map->flags & VFIO_DMA_MAP_FLAG_READ)
>>>  		prot |= IOMMU_READ;
>>>  
>>> -	if (!prot || !size || (size | iova | vaddr) & mask)
>>> +	if (!prot || !size ||
>>> +		!IS_ALIGNED(size | iova | vaddr, requested_alignment))
>>>  		return -EINVAL;
>>>  
>>>  	/* Don't allow IOVA or virtual address wrap */
>>
>> This is mostly ignoring the problems with sub-PAGE_SIZE mappings.  For
>> instance, we can only pin on PAGE_SIZE and therefore we only do
>> accounting on PAGE_SIZE, so if the user does 4K mappings across your 64K
>> page, that page gets pinned and accounted 16 times.  Are we going to
>> tell users that their locked memory limit needs to be 16x now?  The rest
>> of the code would need an audit as well to see what other sub-page bugs
>> might be hiding.  Thanks,
> 
> I don't see that. The pinning all happens the same in VFIO, which can
> then happily pass a 64k region to iommu_map. iommu_map will then call
> ->map in 4k chunks on the IOMMU driver ops.
> 
> Will
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/