[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6b21d20c-447f-4059-8cbd-76a8eeebe834@amazon.com>
Date: Wed, 3 Dec 2025 10:03:16 +0000
From: Nikita Kalyazin <kalyazin@...zon.com>
To: "David Hildenbrand (Red Hat)" <david@...nel.org>, Peter Xu
<peterx@...hat.com>
CC: Mike Rapoport <rppt@...nel.org>, <linux-mm@...ck.org>, Andrea Arcangeli
<aarcange@...hat.com>, Andrew Morton <akpm@...ux-foundation.org>, "Axel
Rasmussen" <axelrasmussen@...gle.com>, Baolin Wang
<baolin.wang@...ux.alibaba.com>, Hugh Dickins <hughd@...gle.com>, "James
Houghton" <jthoughton@...gle.com>, "Liam R. Howlett"
<Liam.Howlett@...cle.com>, Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
Michal Hocko <mhocko@...e.com>, Paolo Bonzini <pbonzini@...hat.com>, "Sean
Christopherson" <seanjc@...gle.com>, Shuah Khan <shuah@...nel.org>, "Suren
Baghdasaryan" <surenb@...gle.com>, Vlastimil Babka <vbabka@...e.cz>,
<linux-kernel@...r.kernel.org>, <kvm@...r.kernel.org>,
<linux-kselftest@...r.kernel.org>
Subject: Re: [PATCH v3 4/5] guest_memfd: add support for userfaultfd minor
mode
On 03/12/2025 09:23, David Hildenbrand (Red Hat) wrote:
> On 12/2/25 12:50, Nikita Kalyazin wrote:
>>
>>
>> On 01/12/2025 20:57, Peter Xu wrote:
>>> On Mon, Dec 01, 2025 at 08:12:38PM +0000, Nikita Kalyazin wrote:
>>>>
>>>>
>>>> On 01/12/2025 18:35, Peter Xu wrote:
>>>>> On Mon, Dec 01, 2025 at 04:48:22PM +0000, Nikita Kalyazin wrote:
>>>>>> I believe I found the precise point where we convinced ourselves
>>>>>> that minor
>>>>>> support was sufficient: [1]. If at this moment we don't find that
>>>>>> reasoning
>>>>>> valid anymore, then indeed implementing missing is the only option.
>>>>>>
>>>>>> [1] https://lore.kernel.org/kvm/Z9GsIDVYWoV8d8-C@x1.local
>>>>>
>>>>> Now after I re-read the discussion, I may have made a wrong statement
>>>>> there, sorry. I could have got slightly confused on when the write()
>>>>> syscall can be involved.
>>>>>
>>>>> I agree if you want to get an event when cache missed with the
>>>>> current uffd
>>>>> definitions and when pre-population is forbidden, then MISSING trap is
>>>>> required. That is, with/without the need of UFFDIO_COPY being
>>>>> available.
>>>>>
>>>>> Do I understand it right that UFFDIO_COPY is not allowed in your
>>>>> case, but
>>>>> only write()?
>>>>
>>>> No, UFFDIO_COPY would work perfectly fine. We will still use write()
>>>> whenever we resolve stage-2 faults as they aren't visible to UFFD.
>>>> When a
>>>> userfault occurs at an offset that already has a page in the cache,
>>>> we will
>>>> have to keep using UFFDIO_CONTINUE so it looks like both will be
>>>> required:
>>>>
>>>> - user mapping major fault -> UFFDIO_COPY (fills the cache and
>>>> sets up
>>>> userspace PT)
>>>> - user mapping minor fault -> UFFDIO_CONTINUE (only sets up
>>>> userspace PT)
>>>> - stage-2 fault -> write() (only fills the cache)
>>>
>>> Is stage-2 fault about KVM_MEMORY_EXIT_FLAG_USERFAULT, per James's
>>> series?
>>
>> Yes, that's the one ([1]).
>>
>> [1]
>> https://lore.kernel.org/kvm/20250618042424.330664-1-jthoughton@google.com
>>
>>>
>>> It looks fine indeed, but it looks slightly weird then, as you'll
>>> have two
>>> ways to populate the page cache. Logically here atomicity is indeed not
>>> needed when you trap both MISSING + MINOR.
>>
>> I reran the test based on the UFFDIO_COPY prototype I had using your
>> series [2], and UFFDIO_COPY is slower than write() to populate 512 MiB:
>> 237 vs 202 ms (+17%). Even though UFFDIO_COPY alone is functionally
>> sufficient, I would prefer to have an option to use write() where
>> possible and only falling back to UFFDIO_COPY for userspace faults to
>> have better performance.
>
> Just so I understand correctly: we could even do without UFFDIO_COPY for
> that scenario by using write() + minor faults?
We still need major fault notifications as well (which we were
accidentally generating until this version). But we can resolve them
with write() + UFFDIO_CONTINUE instead of UFFDIO_COPY.
>
> But what you are saying is that there might be a performance benefit in
> using UFFDIO_COPY for userspace faults, to avoid the write()+minor fault
> overhead?
UFFDIO_COPY _may_ be faster to resolve userspace faults because it's a
single syscall instead of two, but the amount of userspace faults, at
least in our scenario, is negligible compared to the amount of stage-2
faults, so I wouldn't use it as an argument for supporting UFFDIO_COPY
if it can be avoided.
>
> --
> Cheers
>
> David
Powered by blists - more mailing lists