lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250516081732.06ef2230.alex.williamson@redhat.com>
Date: Fri, 16 May 2025 08:17:32 -0600
From: Alex Williamson <alex.williamson@...hat.com>
To: lizhe.67@...edance.com
Cc: kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
 muchun.song@...ux.dev, peterx@...hat.com
Subject: Re: [PATCH] vfio/type1: optimize vfio_pin_pages_remote() for
 hugetlbfs folio

On Fri, 16 May 2025 16:16:16 +0800
lizhe.67@...edance.com wrote:

> On Thu, 15 May 2025 15:19:46 -0600, alex.williamson@...hat.com wrote:
> 
> >> +/*
> >> + * Find a random vfio_pfn that belongs to the range
> >> + * [iova, iova + PAGE_SIZE * npage)
> >> + */
> >> +static struct vfio_pfn *vfio_find_vpfn_range(struct vfio_dma *dma,
> >> +		dma_addr_t iova, unsigned long npage)
> >> +{
> >> +	struct vfio_pfn *vpfn;
> >> +	struct rb_node *node = dma->pfn_list.rb_node;
> >> +	dma_addr_t end_iova = iova + PAGE_SIZE * npage;
> >> +
> >> +	while (node) {
> >> +		vpfn = rb_entry(node, struct vfio_pfn, node);
> >> +
> >> +		if (end_iova <= vpfn->iova)
> >> +			node = node->rb_left;
> >> +		else if (iova > vpfn->iova)
> >> +			node = node->rb_right;
> >> +		else
> >> +			return vpfn;
> >> +	}
> >> +	return NULL;
> >> +}  
> >
> >This essentially duplicates vfio_find_vpfn(), where the existing
> >function only finds a single page.  The existing function should be
> >extended for this new use case and callers updated.  Also the vfio_pfn  
> 
> How about implementing it in this way?
> 
> static inline struct vfio_pfn *vfio_find_vpfn(struct vfio_dma *dma, dma_addr_t iova)
> {
> 	return vfio_find_vpfn_range(dma, iova, 1);
> }
> 
> With this implement, the caller who calls vfio_find_vpfn() won't need to
> be modified.

Yes, that would minimize churn elsewhere.

> >is not "random", it's the first vfio_pfn overlapping the range.  
> 
> Yes you are right. I will modify the comments to "Find the first vfio_pfn
> overlapping the range [iova, iova + PAGE_SIZE * npage) in rb tree" in v2.
> 
> >> +
> >>  static void vfio_link_pfn(struct vfio_dma *dma,
> >>  			  struct vfio_pfn *new)
> >>  {
> >> @@ -670,6 +694,31 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr,
> >>  				iova += (PAGE_SIZE * ret);
> >>  				continue;
> >>  			}
> >> +  
> >
> >Spurious new blank line.
> >  
> >> +		}  
> >
> >A new blank line here would be appreciated.  
> 
> Thank you. I will address this in v2.
> 
> >> +		/* Handle hugetlbfs page */
> >> +		if (likely(!disable_hugepages) &&  
> >
> >Isn't this already accounted for with npage = 1?  
> 
> Yes. This check is not necessary. I will remove it in v2.
> 
> >> +				folio_test_hugetlb(page_folio(batch->pages[batch->offset]))) {  
> >
> >I don't follow how this guarantees the entire batch->size is
> >contiguous.  Isn't it possible that a batch could contain multiple
> >hugetlb folios?  Is the assumption here only true if folio_nr_pages()
> >(or specifically the pages remaining) is >= batch->capacity?  What  
> 
> Yes.
> 
> >happens if we try to map the last half of one 2MB hugetlb page and
> >first half of the non-physically-contiguous next page?  Or what if the
> >hugetlb size is 64KB and the batch contains multiple folios that are
> >not physically contiguous?  
> 
> Sorry for my problematic implementation. I will fix it in v2.
> 
> >> +			if (pfn != *pfn_base + pinned)
> >> +				goto out;
> >> +
> >> +			if (!rsvd && !vfio_find_vpfn_range(dma, iova, batch->size)) {
> >> +				if (!dma->lock_cap &&
> >> +				    mm->locked_vm + lock_acct + batch->size > limit) {
> >> +					pr_warn("%s: RLIMIT_MEMLOCK (%ld) exceeded\n",
> >> +						__func__, limit << PAGE_SHIFT);
> >> +					ret = -ENOMEM;
> >> +					goto unpin_out;
> >> +				}
> >> +				pinned += batch->size;
> >> +				npage -= batch->size;
> >> +				vaddr += PAGE_SIZE * batch->size;
> >> +				iova += PAGE_SIZE * batch->size;
> >> +				lock_acct += batch->size;
> >> +				batch->offset += batch->size;
> >> +				batch->size = 0;
> >> +				continue;
> >> +			}  
> >
> >There's a lot of duplication with the existing page-iterative loop.  I
> >think they could be consolidated if we extract the number of known
> >contiguous pages based on the folio into a variable, default 1.  
> 
> Good idea! I will try to implement this optimization based on this idea
> in v2.
> 
> >Also, while this approach is an improvement, it leaves a lot on the
> >table in scenarios where folio_nr_pages() exceeds batch->capacity.  For
> >example we're at best incrementing 1GB hugetlb pages in 2MB increments.
> >We're also wasting a lot of cycles to fill pages points we mostly don't
> >use.  Thanks,  
> 
> Yes, this is the ideal case. However, we are uncertain whether the pages
> to be pinned are hugetlbfs folio, and increasing batch->capacity would
> lead to a significant increase in memory usage. Additionally, since
> pin_user_pages_remote() currently lacks a variant that can reduce the
> overhead of "filling pages points" of hugetlbfs folio (maybe not correct).
> I suggest we proceeding with additional optimizations after further
> infrastructure improvements.

That's fair, it's a nice improvement within the existing interfaces
even if we choose to pursue something more aggressive in the future.
Please consider documenting improvements relative to 1GB hugetlb in the
next version also.  Thanks,

Alex


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ