[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <77d99da0-10eb-4a4d-8ad9-c6ec83cb4540@redhat.com>
Date: Fri, 4 Jul 2025 10:47:00 +0200
From: David Hildenbrand <david@...hat.com>
To: lizhe.67@...edance.com, alex.williamson@...hat.com,
akpm@...ux-foundation.org, peterx@...hat.com, jgg@...pe.ca
Cc: kvm@...r.kernel.org, linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH v2 5/5] vfio/type1: optimize vfio_unpin_pages_remote()
On 04.07.25 08:26, lizhe.67@...edance.com wrote:
> From: Li Zhe <lizhe.67@...edance.com>
>
> When vfio_unpin_pages_remote() is called with a range of addresses that
> includes large folios, the function currently performs individual
> put_pfn() operations for each page. This can lead to significant
> performance overheads, especially when dealing with large ranges of pages.
>
> It would be very rare for reserved PFNs and non reserved will to be mixed
> within the same range. So this patch utilizes the has_rsvd variable
> introduced in the previous patch to determine whether batch put_pfn()
> operations can be performed. Moreover, compared to put_pfn(),
> unpin_user_page_range_dirty_lock() is capable of handling large folio
> scenarios more efficiently.
>
> The performance test results for completing the 16G VFIO IOMMU DMA
> unmapping are as follows.
>
> Base(v6.16-rc4):
> ./vfio-pci-mem-dma-map 0000:03:00.0 16
> ------- AVERAGE (MADV_HUGEPAGE) --------
> VFIO UNMAP DMA in 0.135 s (118.6 GB/s)
> ------- AVERAGE (MAP_POPULATE) --------
> VFIO UNMAP DMA in 0.312 s (51.3 GB/s)
> ------- AVERAGE (HUGETLBFS) --------
> VFIO UNMAP DMA in 0.136 s (117.3 GB/s)
>
> With this patchset:
> ------- AVERAGE (MADV_HUGEPAGE) --------
> VFIO UNMAP DMA in 0.045 s (357.0 GB/s)
> ------- AVERAGE (MAP_POPULATE) --------
> VFIO UNMAP DMA in 0.288 s (55.6 GB/s)
> ------- AVERAGE (HUGETLBFS) --------
> VFIO UNMAP DMA in 0.045 s (353.9 GB/s)
>
> For large folio, we achieve an over 66% performance improvement in
> the VFIO UNMAP DMA item. For small folios, the performance test
> results appear to show a slight improvement.
>
> Suggested-by: Jason Gunthorpe <jgg@...pe.ca>
> Signed-off-by: Li Zhe <lizhe.67@...edance.com>
> ---
> drivers/vfio/vfio_iommu_type1.c | 20 ++++++++++++++++----
> 1 file changed, 16 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 13c5667d431c..3971539b0d67 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -792,17 +792,29 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr,
> return pinned;
> }
>
> +static inline void put_valid_unreserved_pfns(unsigned long start_pfn,
> + unsigned long npage, int prot)
> +{
> + unpin_user_page_range_dirty_lock(pfn_to_page(start_pfn), npage,
> + prot & IOMMU_WRITE);
> +}
> +
> static long vfio_unpin_pages_remote(struct vfio_dma *dma, dma_addr_t iova,
> unsigned long pfn, unsigned long npage,
> bool do_accounting)
> {
> long unlocked = 0, locked = vpfn_pages(dma, iova, npage);
> - long i;
>
> - for (i = 0; i < npage; i++)
> - if (put_pfn(pfn++, dma->prot))
> - unlocked++;
> + if (dma->has_rsvd) {
> + long i;
No need to move "long i" here, but also doesn't really matter.
>
> + for (i = 0; i < npage; i++)
> + if (put_pfn(pfn++, dma->prot))
> + unlocked++;
> + } else {
> + put_valid_unreserved_pfns(pfn, npage, dma->prot);
> + unlocked = npage;
> + }
> if (do_accounting)
> vfio_lock_acct(dma, locked - unlocked, true);
>
Reviewed-by: David Hildenbrand <david@...hat.com>
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists