linux-kernel - Re: [RFC PATCH 0/2] mm: filemap: add filemap_grab

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5c62bdbb-7a4e-4178-8c03-e84491d8d150@redhat.com>
Date: Mon, 13 Jan 2025 13:20:23 +0100
From: David Hildenbrand <david@...hat.com>
To: kalyazin@...zon.com, willy@...radead.org, pbonzini@...hat.com,
 linux-fsdevel@...r.kernel.org, linux-mm@...ck.org,
 linux-kernel@...r.kernel.org, kvm@...r.kernel.org
Cc: michael.day@....com, jthoughton@...gle.com, michael.roth@....com,
 ackerleytng@...gle.com, graf@...zon.de, jgowans@...zon.com,
 roypat@...zon.co.uk, derekmn@...zon.com, nsaenz@...zon.es,
 xmarcalx@...zon.com
Subject: Re: [RFC PATCH 0/2] mm: filemap: add filemap_grab_folios

On 10.01.25 19:54, Nikita Kalyazin wrote:
> On 10/01/2025 17:01, David Hildenbrand wrote:
>> On 10.01.25 16:46, Nikita Kalyazin wrote:
>>> Based on David's suggestion for speeding up guest_memfd memory
>>> population [1] made at the guest_memfd upstream call on 5 Dec 2024 [2],
>>> this adds `filemap_grab_folios` that grabs multiple folios at a time.
>>>
>>
>> Hi,
> 
> Hi :)
> 
>>
>>> Motivation
>>>
>>> When profiling guest_memfd population and comparing the results with
>>> population of anonymous memory via UFFDIO_COPY, I observed that the
>>> former was up to 20% slower, mainly due to adding newly allocated pages
>>> to the pagecache.  As far as I can see, the two main contributors to it
>>> are pagecache locking and tree traversals needed for every folio.  The
>>> RFC attempts to partially mitigate those by adding multiple folios at a
>>> time to the pagecache.
>>>
>>> Testing
>>>
>>> With the change applied, I was able to observe a 10.3% (708 to 635 ms)
>>> speedup in a selftest that populated 3GiB guest_memfd and a 9.5% (990 to
>>> 904 ms) speedup when restoring a 3GiB guest_memfd VM snapshot using a
>>> custom Firecracker version, both on Intel Ice Lake.
>>
>> Does that mean that it's still 10% slower (based on the 20% above), or
>> were the 20% from a different micro-benchmark?
> 
> Yes, it is still slower:
>    - isolated/selftest: 2.3%
>    - Firecracker setup: 8.9%
> 
> Not sure why the values are so different though.  I'll try to find an
> explanation.

The 2.3% looks very promising.

> 
>>>
>>> Limitations
>>>
>>> While `filemap_grab_folios` handles THP/large folios internally and
>>> deals with reclaim artifacts in the pagecache (shadows), for simplicity
>>> reasons, the RFC does not support those as it demonstrates the
>>> optimisation applied to guest_memfd, which only uses small folios and
>>> does not support reclaim at the moment.
>>
>> It might be worth pointing out that, while support for larger folios is
>> in the works, there will be scenarios where small folios are unavoidable
>> in the future (mixture of shared and private memory).
>>
>> How hard would it be to just naturally support large folios as well?
> 
> I don't think it's going to be impossible.  It's just one more dimension
> that needs to be handled.  `__filemap_add_folio` logic is already rather
> complex, and processing multiple folios while also splitting when
> necessary correctly looks substantially convoluted to me.  So my idea
> was to discuss/validate the multi-folio approach first before rolling
> the sleeves up.

We should likely try making this as generic as possible, meaning we'll
support roughly what filemap_grab_folio() would have supported (e.g., also large folios).

Now I find filemap_get_folios_contig() [thas is already used in memfd code],
and wonder if that could be reused/extended fairly easily.

-- 
Cheers,

David / dhildenb