lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250618132350.GN1376515@ziepe.ca>
Date: Wed, 18 Jun 2025 10:23:50 -0300
From: Jason Gunthorpe <jgg@...pe.ca>
To: lizhe.67@...edance.com
Cc: david@...hat.com, akpm@...ux-foundation.org, alex.williamson@...hat.com,
	kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
	linux-mm@...ck.org, peterx@...hat.com
Subject: Re: [PATCH v4 2/3] gup: introduce unpin_user_folio_dirty_locked()

On Wed, Jun 18, 2025 at 08:19:28PM +0800, lizhe.67@...edance.com wrote:
> On Wed, 18 Jun 2025 08:56:22 -0300, jgg@...pe.ca wrote:
>  
> > On Wed, Jun 18, 2025 at 01:52:37PM +0200, David Hildenbrand wrote:
> > 
> > > I thought we also wanted to optimize out the
> > > is_invalid_reserved_pfn() check for each subpage of a folio.
> 
> Yes, that is an important aspect of our optimization.
> 
> > VFIO keeps a tracking structure for the ranges, you can record there
> > if a reserved PFN was ever placed into this range and skip the check
> > entirely.
> > 
> > It would be very rare for reserved PFNs and non reserved will to be
> > mixed within the same range, userspace could cause this but nothing
> > should.
> 
> Yes, but it seems we don't have a very straightforward interface to
> obtain the reserved attribute of this large range of pfns.

vfio_unmap_unpin()  has the struct vfio_dma, you'd store the
indication there and pass it down.

It already builds the longest run of physical contiguity here:

		for (len = PAGE_SIZE; iova + len < end; len += PAGE_SIZE) {
			next = iommu_iova_to_phys(domain->domain, iova + len);
			if (next != phys + len)
				break;
		}

And we pass down a physically contiguous range to
unmap_unpin_fast()/unmap_unpin_slow().

The only thing you need to do is to detect reserved in
vfio_unmap_unpin() optimized flag in the dma, and break up the above
loop if it crosses a reserved boundary.

If you have a reserved range then just directly call iommu_unmap and
forget about any page pinning.

Then in the page pinning side you use the range version.

Something very approximately like the below. But again, I would
implore you to just use iommufd that is already much better here.

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 1136d7ac6b597e..097b97c67e3f0d 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -738,12 +738,13 @@ static long vfio_unpin_pages_remote(struct vfio_dma *dma, dma_addr_t iova,
 	long unlocked = 0, locked = 0;
 	long i;
 
+	/* The caller has already ensured the pfn range is not reserved */
+	unpin_user_page_range_dirty_lock(pfn_to_page(pfn), npage,
+					 dma->prot & IOMMU_WRITE);
 	for (i = 0; i < npage; i++, iova += PAGE_SIZE) {
-		if (put_pfn(pfn++, dma->prot)) {
 			unlocked++;
 			if (vfio_find_vpfn(dma, iova))
 				locked++;
-		}
 	}
 
 	if (do_accounting)
@@ -1082,6 +1083,7 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma,
 	while (iova < end) {
 		size_t unmapped, len;
 		phys_addr_t phys, next;
+		bool reserved = false;
 
 		phys = iommu_iova_to_phys(domain->domain, iova);
 		if (WARN_ON(!phys)) {
@@ -1089,6 +1091,9 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma,
 			continue;
 		}
 
+		if (dma->has_reserved)
+			reserved = is_invalid_reserved_pfn(phys >> PAGE_SHIFT);
+
 		/*
 		 * To optimize for fewer iommu_unmap() calls, each of which
 		 * may require hardware cache flushing, try to find the
@@ -1098,21 +1103,31 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma,
 			next = iommu_iova_to_phys(domain->domain, iova + len);
 			if (next != phys + len)
 				break;
+			if (dma->has_reserved &&
+			    reserved != is_invalid_reserved_pfn(next >> PAGE_SHIFT))
+				break;
 		}
 
 		/*
 		 * First, try to use fast unmap/unpin. In case of failure,
 		 * switch to slow unmap/unpin path.
 		 */
-		unmapped = unmap_unpin_fast(domain, dma, &iova, len, phys,
-					    &unlocked, &unmapped_region_list,
-					    &unmapped_region_cnt,
-					    &iotlb_gather);
-		if (!unmapped) {
-			unmapped = unmap_unpin_slow(domain, dma, &iova, len,
-						    phys, &unlocked);
-			if (WARN_ON(!unmapped))
-				break;
+		if (reserved) {
+			unmapped = iommu_unmap(domain->domain, iova, len);
+			*iova += unmapped;
+		} else {
+			unmapped = unmap_unpin_fast(domain, dma, &iova, len,
+						    phys, &unlocked,
+						    &unmapped_region_list,
+						    &unmapped_region_cnt,
+						    &iotlb_gather);
+			if (!unmapped) {
+				unmapped = unmap_unpin_slow(domain, dma, &iova,
+							    len, phys,
+							    &unlocked);
+				if (WARN_ON(!unmapped))
+					break;
+			}
 		}
 	}
 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ