[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <664e5604-fe7c-449f-bb2a-48c9543fecf4@redhat.com>
Date: Thu, 3 Jul 2025 13:06:26 +0200
From: David Hildenbrand <david@...hat.com>
To: lizhe.67@...edance.com
Cc: alex.williamson@...hat.com, jgg@...dia.com, jgg@...pe.ca,
kvm@...r.kernel.org, linux-kernel@...r.kernel.org, peterx@...hat.com
Subject: Re: [PATCH 0/4] vfio/type1: optimize vfio_pin_pages_remote() and
vfio_unpin_pages_remote() for large folio
On 03.07.25 05:54, lizhe.67@...edance.com wrote:
> On Wed, 2 Jul 2025 11:57:08 +0200, david@...hat.com wrote:
>
>> On 02.07.25 11:38, lizhe.67@...edance.com wrote:
>>> On Wed, 2 Jul 2025 10:15:29 +0200, david@...hat.com wrote:
>>>
>>>> Jason mentioned in reply to the other series that, ideally, vfio
>>>> shouldn't be messing with folios at all.
>>>>
>>>> While we now do that on the unpin side, we still do it at the pin side.
>>>
>>> Yes.
>>>
>>>> Which makes me wonder if we can avoid folios in patch #1 in
>>>> contig_pages(), and simply collect pages that correspond to consecutive
>>>> PFNs.
>>>
>>> In my opinion, comparing whether the pfns of two pages are contiguous
>>> is relatively inefficient. Using folios might be a more efficient
>>> solution.
>>
>> buffer[i + 1] == nth_page(buffer[i], 1)
>>
>> Is extremely efficient, except on
>>
>> #if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
>>
>> Because it's essentially
>>
>> buffer[i + 1] == buffer[i] + 1
>>
>> But with that config it's less efficient
>>
>> buffer[i + 1] == pfn_to_page(page_to_pfn(buffer[i]) + 1)
>>
>> That could be optimized (if we care about the config), assuming that we don't cross
>> memory sections (e.g., 128 MiB on x86).
>>
>> See page_ext_iter_next_fast_possible(), that optimized for something similar.
>>
>> So based on the first page, one could easily determine how far to batch
>> using the simple
>>
>> buffer[i + 1] == buffer[i] + 1
>>
>> comparison.
>>
>> That would mean that one could exceed a folio, in theory.
>
> Thank you very much for your suggestion. I think we can focus on
> optimizing the case where
>
> !(defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP))
>
> I believe that in most scenarios where vfio is used,
> CONFIG_SPARSEMEM_VMEMMAP is enabled. Excessive CONFIG
> may make the patch appear overly complicated.
>
>>> Given that 'page' is already in use within vfio, it seems that adopting
>>> 'folio' wouldn't be particularly troublesome? If you have any better
>>> suggestions, I sincerely hope you would share them with me.
>>
>> One challenge in the future will likely be that not all pages that we can
>> GUP will belong to folios. We would possibly be able to handle that by
>> checking if the page actually belongs to a folio.
>>
>> Not dealing with folios where avoidable would be easier.
>>
>>>
>>>> What was the reason again, that contig_pages() would not exceed a single
>>>> folio?
>>>
>>> Regarding this issue, I think Alex and I are on the same page[1]. For a
>>> folio, all of its pages have the same invalid or reserved state. In
>>> the function vfio_pin_pages_remote(), we need to ensure that the state
>>> is the same as the previous pfn (through variable 'rsvd' and function
>>> is_invalid_reserved_pfn()). Therefore, we do not want the return value
>>> of contig_pages() to exceed a single folio.
>>
>> If we obtained a page from GUP, is_invalid_reserved_pfn() would only trigger
>> for the shared zeropage. but that one can no longer be returned from FOLL_LONGTERM.
>>
>> So if you know the pages came from GUP, I would assume they are never invalid_reserved?
>
> Yes, we use function vaddr_get_pfns(), which ultimately invokes GUP
> with the FOLL_LONGTERM flag.
>
>> Again, just a thought on how to apply something similar as done for the unpin case, avoiding
>> messing with folios.
>
> Taking into account the previous discussion, it seems that we might
> simply replace the contig_pages() in patch #1 with the following one.
> Also, function contig_pages() could also be extracted into mm.h as a
> helper function. It seems that Jason would like to utilize it in other
> contexts. Moreover, the subject of this patchset should be changed to
> "Optimize vfio_pin_pages_remote() and vfio_unpin_pages_remote()". Do
> you think this would work?
>
> +static inline unsigned long contig_pages(struct page **pages,
> + unsigned long size)
size -> nr_pages
> +{
> + struct page *first_page = pages[0];
> + unsigned long i;
> +
> + for (i = 1; i < size; i++)
> + if (pages[i] != nth_page(first_page, i))
> + break;
> + return i;
> +}
LGTM.
I wonder if we can find a better function name, especially when moving
this to some header where it can be reused.
Something that expresses that we will return the next batch that starts
at the first page.
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists