linux-kernel - Re: [PATCH v4 2/3] gup: introduce unpin_user_folio_dirty

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <cb926401-6bfd-44cc-b126-28204225b820@redhat.com>
Date: Wed, 18 Jun 2025 13:52:37 +0200
From: David Hildenbrand <david@...hat.com>
To: Jason Gunthorpe <jgg@...pe.ca>
Cc: lizhe.67@...edance.com, akpm@...ux-foundation.org,
 alex.williamson@...hat.com, kvm@...r.kernel.org,
 linux-kernel@...r.kernel.org, linux-mm@...ck.org, peterx@...hat.com
Subject: Re: [PATCH v4 2/3] gup: introduce unpin_user_folio_dirty_locked()

On 18.06.25 13:46, Jason Gunthorpe wrote:
> On Wed, Jun 18, 2025 at 01:42:09PM +0200, David Hildenbrand wrote:
>> On 18.06.25 13:40, David Hildenbrand wrote:
>>> On 18.06.25 13:36, Jason Gunthorpe wrote:
>>>> On Wed, Jun 18, 2025 at 02:28:20PM +0800, lizhe.67@...edance.com wrote:
>>>>> On Tue, 17 Jun 2025 12:22:10 -0300, jgg@...pe.ca wrote:
>>>>>> +	while (npage) {
>>>>>> +		long nr_pages = 1;
>>>>>> +
>>>>>> +		if (!is_invalid_reserved_pfn(pfn)) {
>>>>>> +			struct page *page = pfn_to_page(pfn);
>>>>>> +			struct folio *folio = page_folio(page);
>>>>>> +			long folio_pages_num = folio_nr_pages(folio);
>>>>>> +
>>>>>> +			/*
>>>>>> +			 * For a folio, it represents a physically
>>>>>> +			 * contiguous set of bytes, and all of its pages
>>>>>> +			 * share the same invalid/reserved state.
>>>>>> +			 *
>>>>>> +			 * Here, our PFNs are contiguous. Therefore, if we
>>>>>> +			 * detect that the current PFN belongs to a large
>>>>>> +			 * folio, we can batch the operations for the next
>>>>>> +			 * nr_pages PFNs.
>>>>>> +			 */
>>>>>> +			if (folio_pages_num > 1)
>>>>>> +				nr_pages = min_t(long, npage,
>>>>>> +					folio_pages_num -
>>>>>> +					folio_page_idx(folio, page));
>>>>>> +
>>>>>> +			unpin_user_folio_dirty_locked(folio, nr_pages,
>>>>>> +					dma->prot & IOMMU_WRITE);
>>>>>
>>>>> Are you suggesting that we should directly call
>>>>> unpin_user_page_range_dirty_lock() here (patch 3/3) instead?
>>>>
>>>> I'm saying you should not have the word 'folio' inside the VFIO. You
>>>> accumulate a contiguous range of pfns, by only checking the pfn, and
>>>> then call
>>>>
>>>> unpin_user_page_range_dirty_lock(pfn_to_page(first_pfn)...);
>>>>
>>>> No need for any of this. vfio should never look at the struct page
>>>> except as the last moment to pass the range.
>>>
>>> Hah, agreed, that's actually simpler and there is no need to factor
>>> anything out.
>>
>> Ah, no, wait, the problem is that we don't know how many pages we can
>> supply, because there might be is_invalid_reserved_pfn() in the range ...
> 
> You stop batching when you hit any invalid_reserved_pfn and flush it.
> 
> It still has to check read back and check every PFN to make sure it is
> contiguous, checking reserved too is not a problemm.

I thought we also wanted to optimize out the is_invalid_reserved_pfn() 
check for each subpage of a folio.

pfn_valid() + pfn_to_page() are not super cheap in some relevant configs 
IIRC.


-- 
Cheers,

David / dhildenb