linux-kernel - Re: [RFC PATCH 0/2] mm: filemap: add filemap_grab

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <9a188baa-034b-4dd5-b90e-7182f1fbaec6@amazon.com>
Date: Tue, 14 Jan 2025 16:07:45 +0000
From: Nikita Kalyazin <kalyazin@...zon.com>
To: David Hildenbrand <david@...hat.com>, <willy@...radead.org>,
	<pbonzini@...hat.com>, <linux-fsdevel@...r.kernel.org>, <linux-mm@...ck.org>,
	<linux-kernel@...r.kernel.org>, <kvm@...r.kernel.org>
CC: <michael.day@....com>, <jthoughton@...gle.com>, <michael.roth@....com>,
	<ackerleytng@...gle.com>, <graf@...zon.de>, <jgowans@...zon.com>,
	<roypat@...zon.co.uk>, <derekmn@...zon.com>, <nsaenz@...zon.es>,
	<xmarcalx@...zon.com>
Subject: Re: [RFC PATCH 0/2] mm: filemap: add filemap_grab_folios

On 13/01/2025 12:20, David Hildenbrand wrote:
> On 10.01.25 19:54, Nikita Kalyazin wrote:
>> On 10/01/2025 17:01, David Hildenbrand wrote:
>>> On 10.01.25 16:46, Nikita Kalyazin wrote:
>>>> Based on David's suggestion for speeding up guest_memfd memory
>>>> population [1] made at the guest_memfd upstream call on 5 Dec 2024 [2],
>>>> this adds `filemap_grab_folios` that grabs multiple folios at a time.
>>>>
>>>
>>> Hi,
>>
>> Hi :)
>>
>>>
>>>> Motivation
>>>>
>>>> When profiling guest_memfd population and comparing the results with
>>>> population of anonymous memory via UFFDIO_COPY, I observed that the
>>>> former was up to 20% slower, mainly due to adding newly allocated pages
>>>> to the pagecache.  As far as I can see, the two main contributors to it
>>>> are pagecache locking and tree traversals needed for every folio.  The
>>>> RFC attempts to partially mitigate those by adding multiple folios at a
>>>> time to the pagecache.
>>>>
>>>> Testing
>>>>
>>>> With the change applied, I was able to observe a 10.3% (708 to 635 ms)
>>>> speedup in a selftest that populated 3GiB guest_memfd and a 9.5% 
>>>> (990 to
>>>> 904 ms) speedup when restoring a 3GiB guest_memfd VM snapshot using a
>>>> custom Firecracker version, both on Intel Ice Lake.
>>>
>>> Does that mean that it's still 10% slower (based on the 20% above), or
>>> were the 20% from a different micro-benchmark?
>>
>> Yes, it is still slower:
>>    - isolated/selftest: 2.3%
>>    - Firecracker setup: 8.9%
>>
>> Not sure why the values are so different though.  I'll try to find an
>> explanation.
> 
> The 2.3% looks very promising.

It does.  I sorted out my Firecracker setup and saw a similar figure 
there, which made me more confident.

>>
>>>>
>>>> Limitations
>>>>
>>>> While `filemap_grab_folios` handles THP/large folios internally and
>>>> deals with reclaim artifacts in the pagecache (shadows), for simplicity
>>>> reasons, the RFC does not support those as it demonstrates the
>>>> optimisation applied to guest_memfd, which only uses small folios and
>>>> does not support reclaim at the moment.
>>>
>>> It might be worth pointing out that, while support for larger folios is
>>> in the works, there will be scenarios where small folios are unavoidable
>>> in the future (mixture of shared and private memory).
>>>
>>> How hard would it be to just naturally support large folios as well?
>>
>> I don't think it's going to be impossible.  It's just one more dimension
>> that needs to be handled.  `__filemap_add_folio` logic is already rather
>> complex, and processing multiple folios while also splitting when
>> necessary correctly looks substantially convoluted to me.  So my idea
>> was to discuss/validate the multi-folio approach first before rolling
>> the sleeves up.
> 
> We should likely try making this as generic as possible, meaning we'll
> support roughly what filemap_grab_folio() would have supported (e.g., 
> also large folios).
> 
> Now I find filemap_get_folios_contig() [thas is already used in memfd 
> code],
> and wonder if that could be reused/extended fairly easily.

Fair, I will see into how it could be made generic.

> -- 
> Cheers,
> 
> David / dhildenb
>