[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <f0661777-581d-0ebb-c041-92b8eae26f94@linux.vnet.ibm.com>
Date: Mon, 4 Dec 2017 13:36:48 +0100
From: Pierre Morel <pmorel@...ux.vnet.ibm.com>
To: Alex Williamson <alex.williamson@...hat.com>
Cc: cohuck@...hat.com, borntraeger@...ibm.com,
zyimin@...ux.vnet.ibm.com, pasic@...ux.vnet.ibm.com,
kvm@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] vfio/iommu_type1: report the IOMMU aperture info
On 01/12/2017 17:22, Alex Williamson wrote:
> On Fri, 1 Dec 2017 10:38:07 +0100
> Pierre Morel <pmorel@...ux.vnet.ibm.com> wrote:
>
>> On 30/11/2017 19:30, Alex Williamson wrote:
>>> On Thu, 30 Nov 2017 16:11:35 +0100
>>> Pierre Morel <pmorel@...ux.vnet.ibm.com> wrote:
>>>
>>>> On 30/11/2017 15:08, Alex Williamson wrote:
>>>>> On Thu, 30 Nov 2017 12:34:38 +0100
>>>>> Pierre Morel <pmorel@...ux.vnet.ibm.com> wrote:
>>>>>
>>>>>> When userland VFIO defines a new IOMMU for a guest it may
>>>>>> want to specify to the guest the physical limits of
>>>>>> the underlying host IOMMU to avoid access to forbidden
>>>>>> memory ranges.
>>>>>>
>>>>>> Currently, the vfio_iommu_type1 driver does not report this
>>>>>> information to userland.
>>>>>>
>>>>>> Let's extend the vfio_iommu_type1_info structure reported
>>>>>> by the ioctl VFIO_IOMMU_GET_INFO command to report the
>>>>>> IOMMU limits as new uint64_t entries aperture_start and
>>>>>> aperture_end.
>>>>>>
>>>>>> Let's also extend the flags bit map to add a flag specifying
>>>>>> if this extension of the info structure is reported or not.
>>>>>>
>>>>>> Signed-off-by: Pierre Morel <pmorel@...ux.vnet.ibm.com>
>>>>>> ---
>>>>>> drivers/vfio/vfio_iommu_type1.c | 42 +++++++++++++++++++++++++++++++++++++++++
>>>>>> include/uapi/linux/vfio.h | 3 +++
>>>>>> 2 files changed, 45 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>>>>>> index 8549cb1..7da5fe0 100644
>>>>>> --- a/drivers/vfio/vfio_iommu_type1.c
>>>>>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>>>>> @@ -1526,6 +1526,40 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
>>>>>> return ret;
>>>>>> }
>>>>>>
>>>>>> +/**
>>>>>> + * vfio_get_aperture - report minimal aperture of a vfio_iommu
>>>>>> + * @iommu: the current vfio_iommu
>>>>>> + * @start: a pointer to the aperture start
>>>>>> + * @end : a pointer to the aperture end
>>>>>> + *
>>>>>> + * This function iterate on the domains using the given vfio_iommu
>>>>>> + * and restrict the aperture to the minimal aperture common
>>>>>> + * to all domains sharing this vfio_iommu.
>>>>>> + */
>>>>>> +static void vfio_get_aperture(struct vfio_iommu *iommu, uint64_t *start,
>>>>>> + uint64_t *end)
>>>>>> +{
>>>>>> + struct iommu_domain_geometry geometry;
>>>>>> + struct vfio_domain *domain;
>>>>>> +
>>>>>> + *start = 0;
>>>>>> + *end = U64_MAX;
>>>>>> +
>>>>>> + mutex_lock(&iommu->lock);
>>>>>> + /* loop on all domains using this vfio_iommu */
>>>>>> + list_for_each_entry(domain, &iommu->domain_list, next) {
>>>>>> + iommu_domain_get_attr(domain->domain, DOMAIN_ATTR_GEOMETRY,
>>>>>> + &geometry);
>>>>>> + if (geometry.force_aperture) {
>>>>>> + if (geometry.aperture_start > *start)
>>>>>> + *start = geometry.aperture_start;
>>>>>> + if (geometry.aperture_end < *end)
>>>>>> + *end = geometry.aperture_end;
>>>>>> + }
>>>>>> + }
>>>>>> + mutex_unlock(&iommu->lock);
>>>>>> +}
>>>>>> +
>>>>>> static long vfio_iommu_type1_ioctl(void *iommu_data,
>>>>>> unsigned int cmd, unsigned long arg)
>>>>>> {
>>>>>> @@ -1560,6 +1594,14 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>>>>>>
>>>>>> info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
>>>>>>
>>>>>> + minsz = min_t(size_t, info.argsz, sizeof(info));
>>>>>> + if (minsz >= offsetofend(struct vfio_iommu_type1_info,
>>>>>> + aperture_end)) {
>>>>>> + info.flags |= VFIO_IOMMU_INFO_APERTURE;
>>>>>> + vfio_get_aperture(iommu, &info.aperture_start,
>>>>>> + &info.aperture_end);
>>>>>> + }
>>>>>> +
>>>>>> return copy_to_user((void __user *)arg, &info, minsz) ?
>>>>>> -EFAULT : 0;
>>>>>>
>>>>>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>>>>>> index 0fb25fb..780d909 100644
>>>>>> --- a/include/uapi/linux/vfio.h
>>>>>> +++ b/include/uapi/linux/vfio.h
>>>>>> @@ -519,6 +519,9 @@ struct vfio_iommu_type1_info {
>>>>>> __u32 flags;
>>>>>> #define VFIO_IOMMU_INFO_PGSIZES (1 << 0) /* supported page sizes info */
>>>>>> __u64 iova_pgsizes; /* Bitmap of supported page sizes */
>>>>>> +#define VFIO_IOMMU_INFO_APERTURE (1 << 1) /* supported aperture info */
>>>>>> + __u64 aperture_start; /* start of DMA aperture */
>>>>>> + __u64 aperture_end; /* end of DMA aperture */
>>>>>> };
>>>>>>
>>>>>> #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
>>>>>
>>>>> This only supports the most simple topology, even x86 cannot claim to
>>>>> have a single contiguous aperture, it's typically bisected by an MSI
>>>>> window. I think we need an API that supports one or more apertures
>>>>> out of the box. Also as Eric indicates, a capability is probably the
>>>>> better option for creating a flexible structure. Thanks,
>>>>>
>>>>> Alex
>>>>>
>>>>
>>>>
>>>> Yes, I understand that a capability here is a must, I will follow this way.
>>>>
>>>> For having multiple aperture and MSI protection, I understood it was
>>>> done using windows and reserved regions.
>>>> Can you point me to my error?
>>>
>>> See the thread from Huawei, I don't think that's a solved problem:
>>>
>>> https://lists.gnu.org/archive/html/qemu-arm/2017-11/msg00237.html
>>>
>>> If you want sysfs to be consumed separately by the user and fed into
>>> new QEMU command line options for creating a VM layout, perhaps that's
>>> sufficient, but I think the vfio api for the iommu should encompass
>>> describing available ranges of mappable iova space without cobbling
>>> together arbitrary info from sysfs. Thanks,
>>>
>>> Alex
>>>
>>
>> Hi Alex,
>>
>> I resume to see if I understood you well:
>>
>> We may have physical IOMMUs with a more complex access that can not be
>> specified by only defining the start and end of a read/write region.
>>
>> Windows can be used to reserve regions for the VM but it is not what we
>> want. What we want is to know what the host can offer which is a mix of
>> aperture and windows.
>>
>> To report this we can use capabilities in a positive way, describing
>> what the host offers not what it can not provide.
>>
>> To achieve this we have to use two interfaces:
>> - VFIO user interface with VFIO_IOMMU_GET_INFO and capabilities
>> - Physical IOMMU interface with both geometry and window iommu_ops
>> callbacks.
>>
>> If it is sufficiently near from what you thought I will provide a new
>> version in this direction.
>
> I believe so. VFIO would construct a set of mappable iova
> regions/windows using information provided via the IOMMU API via
> iommu_ops and expose this via a new capability supporting multiple such
> regions via the VFIO_IOMMU_GET_INFO ioctl. This ioctl would be
> extended to support capabilities in the same way we've done so for
> other vfio ioctls. Thanks,
>
> Alex
>
Hi Alex,
Thanks, I go this way.
Pierre
--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany
Powered by blists - more mailing lists