[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250515151946.1e6edf8b.alex.williamson@redhat.com>
Date: Thu, 15 May 2025 15:19:46 -0600
From: Alex Williamson <alex.williamson@...hat.com>
To: lizhe.67@...edance.com
Cc: kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
muchun.song@...ux.dev, Peter Xu <peterx@...hat.com>
Subject: Re: [PATCH] vfio/type1: optimize vfio_pin_pages_remote() for
hugetlbfs folio
On Tue, 13 May 2025 11:57:30 +0800
lizhe.67@...edance.com wrote:
> From: Li Zhe <lizhe.67@...edance.com>
>
> When vfio_pin_pages_remote() is called with a range of addresses that
> includes hugetlbfs folios, the function currently performs individual
> statistics counting operations for each page. This can lead to significant
> performance overheads, especially when dealing with large ranges of pages.
>
> This patch optimize this process by batching the statistics counting
> operations.
>
> The performance test results for completing the 8G VFIO IOMMU DMA mapping,
> obtained through trace-cmd, are as follows. In this case, the 8G virtual
> address space has been mapped to physical memory using hugetlbfs with
> pagesize=2M.
>
> Before this patch:
> funcgraph_entry: # 33813.703 us | vfio_pin_map_dma();
>
> After this patch:
> funcgraph_entry: # 15635.055 us | vfio_pin_map_dma();
>
> Signed-off-by: Li Zhe <lizhe.67@...edance.com>
> ---
> drivers/vfio/vfio_iommu_type1.c | 49 +++++++++++++++++++++++++++++++++
> 1 file changed, 49 insertions(+)
Hi,
Thanks for looking at improvements in this area...
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 0ac56072af9f..bafa7f8c4cc6 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -337,6 +337,30 @@ static struct vfio_pfn *vfio_find_vpfn(struct vfio_dma *dma, dma_addr_t iova)
> return NULL;
> }
>
> +/*
> + * Find a random vfio_pfn that belongs to the range
> + * [iova, iova + PAGE_SIZE * npage)
> + */
> +static struct vfio_pfn *vfio_find_vpfn_range(struct vfio_dma *dma,
> + dma_addr_t iova, unsigned long npage)
> +{
> + struct vfio_pfn *vpfn;
> + struct rb_node *node = dma->pfn_list.rb_node;
> + dma_addr_t end_iova = iova + PAGE_SIZE * npage;
> +
> + while (node) {
> + vpfn = rb_entry(node, struct vfio_pfn, node);
> +
> + if (end_iova <= vpfn->iova)
> + node = node->rb_left;
> + else if (iova > vpfn->iova)
> + node = node->rb_right;
> + else
> + return vpfn;
> + }
> + return NULL;
> +}
This essentially duplicates vfio_find_vpfn(), where the existing
function only finds a single page. The existing function should be
extended for this new use case and callers updated. Also the vfio_pfn
is not "random", it's the first vfio_pfn overlapping the range.
> +
> static void vfio_link_pfn(struct vfio_dma *dma,
> struct vfio_pfn *new)
> {
> @@ -670,6 +694,31 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr,
> iova += (PAGE_SIZE * ret);
> continue;
> }
> +
Spurious new blank line.
> + }
A new blank line here would be appreciated.
> + /* Handle hugetlbfs page */
> + if (likely(!disable_hugepages) &&
Isn't this already accounted for with npage = 1?
> + folio_test_hugetlb(page_folio(batch->pages[batch->offset]))) {
I don't follow how this guarantees the entire batch->size is
contiguous. Isn't it possible that a batch could contain multiple
hugetlb folios? Is the assumption here only true if folio_nr_pages()
(or specifically the pages remaining) is >= batch->capacity? What
happens if we try to map the last half of one 2MB hugetlb page and
first half of the non-physically-contiguous next page? Or what if the
hugetlb size is 64KB and the batch contains multiple folios that are
not physically contiguous?
> + if (pfn != *pfn_base + pinned)
> + goto out;
> +
> + if (!rsvd && !vfio_find_vpfn_range(dma, iova, batch->size)) {
> + if (!dma->lock_cap &&
> + mm->locked_vm + lock_acct + batch->size > limit) {
> + pr_warn("%s: RLIMIT_MEMLOCK (%ld) exceeded\n",
> + __func__, limit << PAGE_SHIFT);
> + ret = -ENOMEM;
> + goto unpin_out;
> + }
> + pinned += batch->size;
> + npage -= batch->size;
> + vaddr += PAGE_SIZE * batch->size;
> + iova += PAGE_SIZE * batch->size;
> + lock_acct += batch->size;
> + batch->offset += batch->size;
> + batch->size = 0;
> + continue;
> + }
There's a lot of duplication with the existing page-iterative loop. I
think they could be consolidated if we extract the number of known
contiguous pages based on the folio into a variable, default 1.
Also, while this approach is an improvement, it leaves a lot on the
table in scenarios where folio_nr_pages() exceeds batch->capacity. For
example we're at best incrementing 1GB hugetlb pages in 2MB increments.
We're also wasting a lot of cycles to fill pages points we mostly don't
use. Thanks,
Alex
Powered by blists - more mailing lists