[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250516081616.73039-1-lizhe.67@bytedance.com>
Date: Fri, 16 May 2025 16:16:16 +0800
From: lizhe.67@...edance.com
To: alex.williamson@...hat.com
Cc: kvm@...r.kernel.org,
linux-kernel@...r.kernel.org,
lizhe.67@...edance.com,
muchun.song@...ux.dev,
peterx@...hat.com
Subject: Re: [PATCH] vfio/type1: optimize vfio_pin_pages_remote() for hugetlbfs folio
On Thu, 15 May 2025 15:19:46 -0600, alex.williamson@...hat.com wrote:
>> +/*
>> + * Find a random vfio_pfn that belongs to the range
>> + * [iova, iova + PAGE_SIZE * npage)
>> + */
>> +static struct vfio_pfn *vfio_find_vpfn_range(struct vfio_dma *dma,
>> + dma_addr_t iova, unsigned long npage)
>> +{
>> + struct vfio_pfn *vpfn;
>> + struct rb_node *node = dma->pfn_list.rb_node;
>> + dma_addr_t end_iova = iova + PAGE_SIZE * npage;
>> +
>> + while (node) {
>> + vpfn = rb_entry(node, struct vfio_pfn, node);
>> +
>> + if (end_iova <= vpfn->iova)
>> + node = node->rb_left;
>> + else if (iova > vpfn->iova)
>> + node = node->rb_right;
>> + else
>> + return vpfn;
>> + }
>> + return NULL;
>> +}
>
>This essentially duplicates vfio_find_vpfn(), where the existing
>function only finds a single page. The existing function should be
>extended for this new use case and callers updated. Also the vfio_pfn
How about implementing it in this way?
static inline struct vfio_pfn *vfio_find_vpfn(struct vfio_dma *dma, dma_addr_t iova)
{
return vfio_find_vpfn_range(dma, iova, 1);
}
With this implement, the caller who calls vfio_find_vpfn() won't need to
be modified.
>is not "random", it's the first vfio_pfn overlapping the range.
Yes you are right. I will modify the comments to "Find the first vfio_pfn
overlapping the range [iova, iova + PAGE_SIZE * npage) in rb tree" in v2.
>> +
>> static void vfio_link_pfn(struct vfio_dma *dma,
>> struct vfio_pfn *new)
>> {
>> @@ -670,6 +694,31 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr,
>> iova += (PAGE_SIZE * ret);
>> continue;
>> }
>> +
>
>Spurious new blank line.
>
>> + }
>
>A new blank line here would be appreciated.
Thank you. I will address this in v2.
>> + /* Handle hugetlbfs page */
>> + if (likely(!disable_hugepages) &&
>
>Isn't this already accounted for with npage = 1?
Yes. This check is not necessary. I will remove it in v2.
>> + folio_test_hugetlb(page_folio(batch->pages[batch->offset]))) {
>
>I don't follow how this guarantees the entire batch->size is
>contiguous. Isn't it possible that a batch could contain multiple
>hugetlb folios? Is the assumption here only true if folio_nr_pages()
>(or specifically the pages remaining) is >= batch->capacity? What
Yes.
>happens if we try to map the last half of one 2MB hugetlb page and
>first half of the non-physically-contiguous next page? Or what if the
>hugetlb size is 64KB and the batch contains multiple folios that are
>not physically contiguous?
Sorry for my problematic implementation. I will fix it in v2.
>> + if (pfn != *pfn_base + pinned)
>> + goto out;
>> +
>> + if (!rsvd && !vfio_find_vpfn_range(dma, iova, batch->size)) {
>> + if (!dma->lock_cap &&
>> + mm->locked_vm + lock_acct + batch->size > limit) {
>> + pr_warn("%s: RLIMIT_MEMLOCK (%ld) exceeded\n",
>> + __func__, limit << PAGE_SHIFT);
>> + ret = -ENOMEM;
>> + goto unpin_out;
>> + }
>> + pinned += batch->size;
>> + npage -= batch->size;
>> + vaddr += PAGE_SIZE * batch->size;
>> + iova += PAGE_SIZE * batch->size;
>> + lock_acct += batch->size;
>> + batch->offset += batch->size;
>> + batch->size = 0;
>> + continue;
>> + }
>
>There's a lot of duplication with the existing page-iterative loop. I
>think they could be consolidated if we extract the number of known
>contiguous pages based on the folio into a variable, default 1.
Good idea! I will try to implement this optimization based on this idea
in v2.
>Also, while this approach is an improvement, it leaves a lot on the
>table in scenarios where folio_nr_pages() exceeds batch->capacity. For
>example we're at best incrementing 1GB hugetlb pages in 2MB increments.
>We're also wasting a lot of cycles to fill pages points we mostly don't
>use. Thanks,
Yes, this is the ideal case. However, we are uncertain whether the pages
to be pinned are hugetlbfs folio, and increasing batch->capacity would
lead to a significant increase in memory usage. Additionally, since
pin_user_pages_remote() currently lacks a variant that can reduce the
overhead of "filling pages points" of hugetlbfs folio (maybe not correct).
I suggest we proceeding with additional optimizations after further
infrastructure improvements.
Powered by blists - more mailing lists