[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <69bfdffd-8aa3-4375-9caf-b3311ff72448@kernel.org>
Date: Wed, 3 Dec 2025 10:23:47 +0100
From: "David Hildenbrand (Red Hat)" <david@...nel.org>
To: kalyazin@...zon.com, Peter Xu <peterx@...hat.com>
Cc: Mike Rapoport <rppt@...nel.org>, linux-mm@...ck.org,
Andrea Arcangeli <aarcange@...hat.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Axel Rasmussen <axelrasmussen@...gle.com>,
Baolin Wang <baolin.wang@...ux.alibaba.com>, Hugh Dickins
<hughd@...gle.com>, James Houghton <jthoughton@...gle.com>,
"Liam R. Howlett" <Liam.Howlett@...cle.com>,
Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, Michal Hocko
<mhocko@...e.com>, Paolo Bonzini <pbonzini@...hat.com>,
Sean Christopherson <seanjc@...gle.com>, Shuah Khan <shuah@...nel.org>,
Suren Baghdasaryan <surenb@...gle.com>, Vlastimil Babka <vbabka@...e.cz>,
linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
linux-kselftest@...r.kernel.org
Subject: Re: [PATCH v3 4/5] guest_memfd: add support for userfaultfd minor
mode
On 12/2/25 12:50, Nikita Kalyazin wrote:
>
>
> On 01/12/2025 20:57, Peter Xu wrote:
>> On Mon, Dec 01, 2025 at 08:12:38PM +0000, Nikita Kalyazin wrote:
>>>
>>>
>>> On 01/12/2025 18:35, Peter Xu wrote:
>>>> On Mon, Dec 01, 2025 at 04:48:22PM +0000, Nikita Kalyazin wrote:
>>>>> I believe I found the precise point where we convinced ourselves that minor
>>>>> support was sufficient: [1]. If at this moment we don't find that reasoning
>>>>> valid anymore, then indeed implementing missing is the only option.
>>>>>
>>>>> [1] https://lore.kernel.org/kvm/Z9GsIDVYWoV8d8-C@x1.local
>>>>
>>>> Now after I re-read the discussion, I may have made a wrong statement
>>>> there, sorry. I could have got slightly confused on when the write()
>>>> syscall can be involved.
>>>>
>>>> I agree if you want to get an event when cache missed with the current uffd
>>>> definitions and when pre-population is forbidden, then MISSING trap is
>>>> required. That is, with/without the need of UFFDIO_COPY being available.
>>>>
>>>> Do I understand it right that UFFDIO_COPY is not allowed in your case, but
>>>> only write()?
>>>
>>> No, UFFDIO_COPY would work perfectly fine. We will still use write()
>>> whenever we resolve stage-2 faults as they aren't visible to UFFD. When a
>>> userfault occurs at an offset that already has a page in the cache, we will
>>> have to keep using UFFDIO_CONTINUE so it looks like both will be required:
>>>
>>> - user mapping major fault -> UFFDIO_COPY (fills the cache and sets up
>>> userspace PT)
>>> - user mapping minor fault -> UFFDIO_CONTINUE (only sets up userspace PT)
>>> - stage-2 fault -> write() (only fills the cache)
>>
>> Is stage-2 fault about KVM_MEMORY_EXIT_FLAG_USERFAULT, per James's series?
>
> Yes, that's the one ([1]).
>
> [1]
> https://lore.kernel.org/kvm/20250618042424.330664-1-jthoughton@google.com
>
>>
>> It looks fine indeed, but it looks slightly weird then, as you'll have two
>> ways to populate the page cache. Logically here atomicity is indeed not
>> needed when you trap both MISSING + MINOR.
>
> I reran the test based on the UFFDIO_COPY prototype I had using your
> series [2], and UFFDIO_COPY is slower than write() to populate 512 MiB:
> 237 vs 202 ms (+17%). Even though UFFDIO_COPY alone is functionally
> sufficient, I would prefer to have an option to use write() where
> possible and only falling back to UFFDIO_COPY for userspace faults to
> have better performance.
Just so I understand correctly: we could even do without UFFDIO_COPY for
that scenario by using write() + minor faults?
But what you are saying is that there might be a performance benefit in
using UFFDIO_COPY for userspace faults, to avoid the write()+minor fault
overhead?
--
Cheers
David
Powered by blists - more mailing lists