[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bc5b3724-6a82-15db-1a72-a42e74403c6f@nvidia.com>
Date: Tue, 15 Nov 2016 20:39:25 +0530
From: Kirti Wankhede <kwankhede@...dia.com>
To: Alex Williamson <alex.williamson@...hat.com>
CC: <pbonzini@...hat.com>, <kraxel@...hat.com>, <cjia@...dia.com>,
<qemu-devel@...gnu.org>, <kvm@...r.kernel.org>,
<kevin.tian@...el.com>, <jike.song@...el.com>,
<bjsdjshi@...ux.vnet.ibm.com>, <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v12 10/22] vfio iommu type1: Add support for mediated
devices
On 11/15/2016 4:55 AM, Alex Williamson wrote:
> On Mon, 14 Nov 2016 21:12:24 +0530
> Kirti Wankhede <kwankhede@...dia.com> wrote:
>
>> VFIO IOMMU drivers are designed for the devices which are IOMMU capable.
>> Mediated device only uses IOMMU APIs, the underlying hardware can be
>> managed by an IOMMU domain.
>>
>> Aim of this change is:
>> - To use most of the code of TYPE1 IOMMU driver for mediated devices
>> - To support direct assigned device and mediated device in single module
>>
>> This change adds pin and unpin support for mediated device to TYPE1 IOMMU
>> backend module. More details:
>> - Domain for external user is tracked separately in vfio_iommu structure.
>> It is allocated when group for first mdev device is attached.
>> - Pages pinned for external domain are tracked in each vfio_dma structure
>> for that iova range.
>> - Page tracking rb-tree in vfio_dma keeps <iova, pfn, ref_count>. Key of
>> rb-tree is iova, but it actually aims to track pfns.
>> - On external pin request for an iova, page is pinned only once, if iova
>> is already pinned and tracked, ref_count is incremented.
>
> This is referring only to the external (ie. pfn_list) tracking only,
> correct? In general, a page can be pinned up to twice per iova
> referencing it, once for an iommu mapped domain and again in the
> pfn_list, right?
>
That's right.
...
}
>>
>> - if (!rsvd && !lock_cap && mm->locked_vm + i + 1 > limit) {
>> - put_pfn(pfn, prot);
>> + iova = vaddr - dma->vaddr + dma->iova;
>
>
> nit, this could be incremented in the for loop just like vaddr, we
> don't need to create it from scratch (iova += PAGE_SIZE).
>
Ok.
>
>> + vpfn = vfio_find_vpfn(dma, iova);
>> + if (!vpfn)
>> + lock_acct++;
>> +
>> + if (!rsvd && !lock_cap &&
>> + mm->locked_vm + lock_acct + 1 > limit) {
>
>
> lock_acct was incremented just above, why is this lock_acct + 1? I
> think we're off by one here (ie. remove the +1)?
>
Moved this vfio_find_vpfn() check after this check.
...
>> -static long vfio_unpin_pages_remote(struct vfio_dma *dma, unsigned long pfn,
>> - long npage, int prot, bool do_accounting)
>> +static long vfio_unpin_pages_remote(struct vfio_dma *dma, dma_addr_t iova,
>> + unsigned long pfn, long npage,
>> + bool do_accounting)
>> {
>> - unsigned long unlocked = 0;
>> + long unlocked = 0, locked = 0;
>> long i;
>>
>> - for (i = 0; i < npage; i++)
>> - unlocked += put_pfn(pfn++, prot);
>> + for (i = 0; i < npage; i++) {
>> + struct vfio_pfn *vpfn;
>> +
>> + unlocked += put_pfn(pfn++, dma->prot);
>> +
>> + vpfn = vfio_find_vpfn(dma, iova + (i << PAGE_SHIFT));
>> + if (vpfn)
>> + locked++;
>
> This isn't taking into account reserved pages (ex. device mmaps). In
> the pinning path above we skip accounting of reserved pages, put_pfn
> also only increments for non-reserved pages, but vfio_find_vpfn()
> doesn't care, so it's possible that (locked - unlocked) could result in
> a positive value. Maybe something like:
>
> if (put_pfn(...)) {
> unlocked++;
> if (vfio_find_vpfn(...))
> locked++;
> }
>
Updating.
...
>>
>> -static void vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma)
>> +static int vfio_iommu_type1_pin_pages(void *iommu_data,
>> + unsigned long *user_pfn,
>> + int npage, int prot,
>> + unsigned long *phys_pfn)
>> +{
>> + struct vfio_iommu *iommu = iommu_data;
>> + int i, j, ret;
>> + unsigned long remote_vaddr;
>> + unsigned long *pfn = phys_pfn;
>
> nit, why do we have this variable rather than using phys_pfn directly?
> Maybe before we had unwind support we incremented this rather than
> indexing it?
>
Updating.
...
>> + iova = user_pfn[i] << PAGE_SHIFT;
>> + dma = vfio_find_dma(iommu, iova, 0);
>> + if (!dma)
>> + goto unpin_exit;
>> + unlocked += vfio_unpin_page_external(dma, iova, do_accounting);
>> + }
>> +
>> +unpin_exit:
>> + mutex_unlock(&iommu->lock);
>> + return unlocked;
>
> This is not returning what it's supposed to. unlocked is going to be
> less than or equal to the number of pages unpinned. We don't need to
> track unlocked, I think we're just tracking where we are in the unpin
> array, whether it was partial or complete. I think we want:
>
> return i > npage ? npage : i;
>
> Or maybe we can make it more obvious if there's an error on the first
> unpin entry:
>
> return i > npage ? npage : (i > 0 ? i : -EINVAL);
>
Thanks for pointing that out. Updating.
...
>> +static void vfio_iommu_unmap_unpin_reaccount(struct vfio_iommu *iommu)
>> +{
>> + struct rb_node *n, *p;
>> +
>> + n = rb_first(&iommu->dma_list);
>> + for (; n; n = rb_next(n)) {
>> + struct vfio_dma *dma;
>> + long locked = 0, unlocked = 0;
>> +
>> + dma = rb_entry(n, struct vfio_dma, node);
>> + unlocked += vfio_unmap_unpin(iommu, dma, false);
>> + p = rb_first(&dma->pfn_list);
>> + for (; p; p = rb_next(p))
>> + locked++;
>
> We don't know that these weren't reserved pages. If the vendor driver
> was pinning peer-to-peer ranges vfio_unmap_unpin() might have returned
> 0 yet we're assuming each is locked RAM, so our accounting can go
> upside down.
>
Adding if rsvd check here.
>> + vfio_lock_acct(dma->task, locked - unlocked);
>> + }
>> +}
>> +
>> +static void vfio_external_unpin_all(struct vfio_iommu *iommu,
>> + bool do_accounting)
>> +{
>> + struct rb_node *n, *p;
>> +
>> + n = rb_first(&iommu->dma_list);
>> + for (; n; n = rb_next(n)) {
>> + struct vfio_dma *dma;
>> +
>> + dma = rb_entry(n, struct vfio_dma, node);
>> + while ((p = rb_first(&dma->pfn_list))) {
>> + int unlocked;
>> + struct vfio_pfn *vpfn = rb_entry(p, struct vfio_pfn,
>> + node);
>> +
>> + unlocked = vfio_iova_put_vfio_pfn(dma, vpfn);
>> + if (do_accounting)
>> + vfio_lock_acct(dma->task, -unlocked);
>
> nit, we could batch these further, only updating accounting once per
> vfio_dma, or once entirely.
>
Ok.
Thanks,
Kirti
Powered by blists - more mailing lists