linux-kernel - Re: [PATCH] vfio/iommu_type1: report the IOMMU aperture info

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <f0661777-581d-0ebb-c041-92b8eae26f94@linux.vnet.ibm.com>
Date:   Mon, 4 Dec 2017 13:36:48 +0100
From:   Pierre Morel <pmorel@...ux.vnet.ibm.com>
To:     Alex Williamson <alex.williamson@...hat.com>
Cc:     cohuck@...hat.com, borntraeger@...ibm.com,
        zyimin@...ux.vnet.ibm.com, pasic@...ux.vnet.ibm.com,
        kvm@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] vfio/iommu_type1: report the IOMMU aperture info

On 01/12/2017 17:22, Alex Williamson wrote:
> On Fri, 1 Dec 2017 10:38:07 +0100
> Pierre Morel <pmorel@...ux.vnet.ibm.com> wrote:
> 
>> On 30/11/2017 19:30, Alex Williamson wrote:
>>> On Thu, 30 Nov 2017 16:11:35 +0100
>>> Pierre Morel <pmorel@...ux.vnet.ibm.com> wrote:
>>>    
>>>> On 30/11/2017 15:08, Alex Williamson wrote:
>>>>> On Thu, 30 Nov 2017 12:34:38 +0100
>>>>> Pierre Morel <pmorel@...ux.vnet.ibm.com> wrote:
>>>>>       
>>>>>> When userland VFIO defines a new IOMMU for a guest it may
>>>>>> want to specify to the guest the physical limits of
>>>>>> the underlying host IOMMU to avoid access to forbidden
>>>>>> memory ranges.
>>>>>>
>>>>>> Currently, the vfio_iommu_type1 driver does not report this
>>>>>> information to userland.
>>>>>>
>>>>>> Let's extend the vfio_iommu_type1_info structure reported
>>>>>> by the ioctl VFIO_IOMMU_GET_INFO command to report the
>>>>>> IOMMU limits as new uint64_t entries aperture_start and
>>>>>> aperture_end.
>>>>>>
>>>>>> Let's also extend the flags bit map to add a flag specifying
>>>>>> if this extension of the info structure is reported or not.
>>>>>>
>>>>>> Signed-off-by: Pierre Morel <pmorel@...ux.vnet.ibm.com>
>>>>>> ---
>>>>>>     drivers/vfio/vfio_iommu_type1.c | 42 +++++++++++++++++++++++++++++++++++++++++
>>>>>>     include/uapi/linux/vfio.h       |  3 +++
>>>>>>     2 files changed, 45 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>>>>>> index 8549cb1..7da5fe0 100644
>>>>>> --- a/drivers/vfio/vfio_iommu_type1.c
>>>>>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>>>>> @@ -1526,6 +1526,40 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
>>>>>>     	return ret;
>>>>>>     }
>>>>>>     
>>>>>> +/**
>>>>>> + * vfio_get_aperture - report minimal aperture of a vfio_iommu
>>>>>> + * @iommu: the current vfio_iommu
>>>>>> + * @start: a pointer to the aperture start
>>>>>> + * @end  : a pointer to the aperture end
>>>>>> + *
>>>>>> + * This function iterate on the domains using the given vfio_iommu
>>>>>> + * and restrict the aperture to the minimal aperture common
>>>>>> + * to all domains sharing this vfio_iommu.
>>>>>> + */
>>>>>> +static void vfio_get_aperture(struct vfio_iommu *iommu, uint64_t *start,
>>>>>> +				uint64_t *end)
>>>>>> +{
>>>>>> +	struct iommu_domain_geometry geometry;
>>>>>> +	struct vfio_domain *domain;
>>>>>> +
>>>>>> +	*start = 0;
>>>>>> +	*end = U64_MAX;
>>>>>> +
>>>>>> +	mutex_lock(&iommu->lock);
>>>>>> +	/* loop on all domains using this vfio_iommu */
>>>>>> +	list_for_each_entry(domain, &iommu->domain_list, next) {
>>>>>> +		iommu_domain_get_attr(domain->domain, DOMAIN_ATTR_GEOMETRY,
>>>>>> +					&geometry);
>>>>>> +		if (geometry.force_aperture) {
>>>>>> +			if (geometry.aperture_start > *start)
>>>>>> +				*start = geometry.aperture_start;
>>>>>> +			if (geometry.aperture_end < *end)
>>>>>> +				*end = geometry.aperture_end;
>>>>>> +		}
>>>>>> +	}
>>>>>> +	mutex_unlock(&iommu->lock);
>>>>>> +}
>>>>>> +
>>>>>>     static long vfio_iommu_type1_ioctl(void *iommu_data,
>>>>>>     				   unsigned int cmd, unsigned long arg)
>>>>>>     {
>>>>>> @@ -1560,6 +1594,14 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>>>>>>     
>>>>>>     		info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
>>>>>>     
>>>>>> +		minsz = min_t(size_t, info.argsz, sizeof(info));
>>>>>> +		if (minsz >= offsetofend(struct vfio_iommu_type1_info,
>>>>>> +					 aperture_end)) {
>>>>>> +			info.flags |= VFIO_IOMMU_INFO_APERTURE;
>>>>>> +			vfio_get_aperture(iommu, &info.aperture_start,
>>>>>> +					  &info.aperture_end);
>>>>>> +		}
>>>>>> +
>>>>>>     		return copy_to_user((void __user *)arg, &info, minsz) ?
>>>>>>     			-EFAULT : 0;
>>>>>>     
>>>>>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>>>>>> index 0fb25fb..780d909 100644
>>>>>> --- a/include/uapi/linux/vfio.h
>>>>>> +++ b/include/uapi/linux/vfio.h
>>>>>> @@ -519,6 +519,9 @@ struct vfio_iommu_type1_info {
>>>>>>     	__u32	flags;
>>>>>>     #define VFIO_IOMMU_INFO_PGSIZES (1 << 0)	/* supported page sizes info */
>>>>>>     	__u64	iova_pgsizes;		/* Bitmap of supported page sizes */
>>>>>> +#define VFIO_IOMMU_INFO_APERTURE (1 << 1)	/* supported aperture info */
>>>>>> +	__u64   aperture_start;		/* start of DMA aperture */
>>>>>> +	__u64   aperture_end;		/* end of DMA aperture */
>>>>>>     };
>>>>>>     
>>>>>>     #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
>>>>>
>>>>> This only supports the most simple topology, even x86 cannot claim to
>>>>> have a single contiguous aperture, it's typically bisected by an MSI
>>>>> window.  I think we need an API that supports one or more apertures
>>>>> out of the box.  Also as Eric indicates, a capability is probably the
>>>>> better option for creating a flexible structure.  Thanks,
>>>>>
>>>>> Alex
>>>>>       
>>>>
>>>>
>>>> Yes, I understand that a capability here is a must, I will follow this way.
>>>>
>>>> For having multiple aperture and MSI protection, I understood it was
>>>> done using windows and reserved regions.
>>>> Can you point me to my error?
>>>
>>> See the thread from Huawei, I don't think that's a solved problem:
>>>
>>> https://lists.gnu.org/archive/html/qemu-arm/2017-11/msg00237.html
>>>
>>> If you want sysfs to be consumed separately by the user and fed into
>>> new QEMU command line options for creating a VM layout, perhaps that's
>>> sufficient, but I think the vfio api for the iommu should encompass
>>> describing available ranges of mappable iova space without cobbling
>>> together arbitrary info from sysfs.  Thanks,
>>>
>>> Alex
>>>    
>>
>> Hi Alex,
>>
>> I resume to see if I understood you well:
>>
>> We may have physical IOMMUs with a more complex access that can not be
>> specified by only defining the start and end of a read/write region.
>>
>> Windows can be used to reserve regions for the VM but it is not what we
>> want. What we want is to know what the host can offer which is a mix of
>> aperture and windows.
>>
>> To report this we can use capabilities in a positive way, describing
>> what the host offers not what it can not provide.
>>
>> To achieve this we have to use two interfaces:
>> - VFIO user interface with VFIO_IOMMU_GET_INFO and capabilities
>> - Physical IOMMU interface with both geometry and window iommu_ops
>> callbacks.
>>
>> If it is sufficiently near from what you thought I will provide a new
>> version in this direction.
> 
> I believe so.  VFIO would construct a set of mappable iova
> regions/windows using information provided via the IOMMU API via
> iommu_ops and expose this via a new capability supporting multiple such
> regions via the VFIO_IOMMU_GET_INFO ioctl.  This ioctl would be
> extended to support capabilities in the same way we've done so for
> other vfio ioctls.  Thanks,
> 
> Alex
> 

Hi Alex,

Thanks, I go this way.

Pierre


-- 
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany