linux-kernel - Re: [PATCH 0/4] vfio/type1: optimize vfio_pin_pages_remote() and vfio_unpin_pages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <c1144447-6b67-48d3-b37c-5f1ca6a9b4a7@redhat.com>
Date: Wed, 2 Jul 2025 11:57:08 +0200
From: David Hildenbrand <david@...hat.com>
To: lizhe.67@...edance.com
Cc: alex.williamson@...hat.com, jgg@...dia.com, jgg@...pe.ca,
 kvm@...r.kernel.org, linux-kernel@...r.kernel.org, peterx@...hat.com
Subject: Re: [PATCH 0/4] vfio/type1: optimize vfio_pin_pages_remote() and
 vfio_unpin_pages_remote() for large folio

On 02.07.25 11:38, lizhe.67@...edance.com wrote:
> On Wed, 2 Jul 2025 10:15:29 +0200, david@...hat.com wrote:
> 
>> Jason mentioned in reply to the other series that, ideally, vfio
>> shouldn't be messing with folios at all.
>>
>> While we now do that on the unpin side, we still do it at the pin side.
> 
> Yes.
> 
>> Which makes me wonder if we can avoid folios in patch #1 in
>> contig_pages(), and simply collect pages that correspond to consecutive
>> PFNs.
> 
> In my opinion, comparing whether the pfns of two pages are contiguous
> is relatively inefficient. Using folios might be a more efficient
> solution.

	buffer[i + 1] == nth_page(buffer[i], 1)

Is extremely efficient, except on

	#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)

Because it's essentially

	buffer[i + 1] == buffer[i] + 1

But with that config it's less efficient

	buffer[i + 1] == pfn_to_page(page_to_pfn(buffer[i]) + 1)

That could be optimized (if we care about the config), assuming that we don't cross
memory sections (e.g., 128 MiB on x86).

See page_ext_iter_next_fast_possible(), that optimized for something similar.

So based on the first page, one could easily determine how far to batch
using the simple

	buffer[i + 1] == buffer[i] + 1

comparison.

That would mean that one could exceed a folio, in theory.

> 
> Given that 'page' is already in use within vfio, it seems that adopting
> 'folio' wouldn't be particularly troublesome? If you have any better
> suggestions, I sincerely hope you would share them with me.

One challenge in the future will likely be that not all pages that we can
GUP will belong to folios. We would possibly be able to handle that by
checking if the page actually belongs to a folio.

Not dealing with folios where avoidable would be easier.

> 
>> What was the reason again, that contig_pages() would not exceed a single
>> folio?
> 
> Regarding this issue, I think Alex and I are on the same page[1]. For a
> folio, all of its pages have the same invalid or reserved state. In
> the function vfio_pin_pages_remote(), we need to ensure that the state
> is the same as the previous pfn (through variable 'rsvd' and function
> is_invalid_reserved_pfn()). Therefore, we do not want the return value
> of contig_pages() to exceed a single folio.

If we obtained a page from GUP, is_invalid_reserved_pfn() would only trigger
for the shared zeropage. but that one can no longer be returned from FOLL_LONGTERM.

So if you know the pages came from GUP, I would assume they are never invalid_reserved?

Again, just a thought on how to apply something similar as done for the unpin case, avoiding
messing with folios.

-- 
Cheers,

David / dhildenb