[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cd354fc0-e500-472d-ac33-0bc43c0d898f@amazon.com>
Date: Tue, 2 Dec 2025 15:59:49 +0000
From: Nikita Kalyazin <kalyazin@...zon.com>
To: Peter Xu <peterx@...hat.com>
CC: "David Hildenbrand (Red Hat)" <david@...nel.org>, Mike Rapoport
<rppt@...nel.org>, <linux-mm@...ck.org>, Andrea Arcangeli
<aarcange@...hat.com>, Andrew Morton <akpm@...ux-foundation.org>, "Axel
Rasmussen" <axelrasmussen@...gle.com>, Baolin Wang
<baolin.wang@...ux.alibaba.com>, Hugh Dickins <hughd@...gle.com>, "James
Houghton" <jthoughton@...gle.com>, "Liam R. Howlett"
<Liam.Howlett@...cle.com>, Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
Michal Hocko <mhocko@...e.com>, Paolo Bonzini <pbonzini@...hat.com>, "Sean
Christopherson" <seanjc@...gle.com>, Shuah Khan <shuah@...nel.org>, "Suren
Baghdasaryan" <surenb@...gle.com>, Vlastimil Babka <vbabka@...e.cz>,
<linux-kernel@...r.kernel.org>, <kvm@...r.kernel.org>,
<linux-kselftest@...r.kernel.org>
Subject: Re: [PATCH v3 4/5] guest_memfd: add support for userfaultfd minor
mode
On 02/12/2025 15:36, Peter Xu wrote:
> On Tue, Dec 02, 2025 at 11:50:31AM +0000, Nikita Kalyazin wrote:
>>> It looks fine indeed, but it looks slightly weird then, as you'll have two
>>> ways to populate the page cache. Logically here atomicity is indeed not
>>> needed when you trap both MISSING + MINOR.
>>
>> I reran the test based on the UFFDIO_COPY prototype I had using your series
>> [2], and UFFDIO_COPY is slower than write() to populate 512 MiB: 237 vs 202
>> ms (+17%). Even though UFFDIO_COPY alone is functionally sufficient, I
>> would prefer to have an option to use write() where possible and only
>> falling back to UFFDIO_COPY for userspace faults to have better performance.
>
> Yes, write() should be fine.
>
> Especially to gmem, I guess write() support is needed when VMAs cannot be
> mapped at all in strict CoCo context, so it needs to be available one way
> or another.
write() is supposed to be supported only for shared memory, ie
accessible to the host. AFAIK private memory should be populated via
other mechanisms.
>
> IIUC it's because UFFDIO_COPY (or memcpy(), I recall you used to test that
> instead) will involve pgtable operations.
Yes, for memcpy() it's even worse because it triggers VMA faults for
every page. UFFDIO_COPY's overhead is lower because the only extra
thing it does compared to write() is installing user PTs.
> instead) will involve pgtable operations. So I wonder if the VMA mapping
> the gmem will still be accessed at some point later (either private->share
> convertable ones for device DMAs for CoCo, or fully shared non-CoCo use
> case), then the pgtable overhead will happen later for a write()-styled
> fault resolution.
At least in Firecracker use case, only pages that are related to PV
devices are going to get accessed by the VMM via user PTs (such as
virtio queues and buffers). The majority of pages are only touched by
vCPUs via stage-2 mappings and are never accessed via user PTs.
>
> From that POV, above number makes sense.
>
> Thanks for the extra testing results.
>
>>
>> [2]
>> https://lore.kernel.org/all/7666ee96-6f09-4dc1-8cb2-002a2d2a29cf@amazon.com
>
> --
> Peter Xu
>
Powered by blists - more mailing lists