lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <415a5956-1dec-4f10-be36-85f6d4d8f4b4@amazon.com>
Date: Tue, 2 Dec 2025 11:50:31 +0000
From: Nikita Kalyazin <kalyazin@...zon.com>
To: Peter Xu <peterx@...hat.com>
CC: "David Hildenbrand (Red Hat)" <david@...nel.org>, Mike Rapoport
	<rppt@...nel.org>, <linux-mm@...ck.org>, Andrea Arcangeli
	<aarcange@...hat.com>, Andrew Morton <akpm@...ux-foundation.org>, "Axel
 Rasmussen" <axelrasmussen@...gle.com>, Baolin Wang
	<baolin.wang@...ux.alibaba.com>, Hugh Dickins <hughd@...gle.com>, "James
 Houghton" <jthoughton@...gle.com>, "Liam R. Howlett"
	<Liam.Howlett@...cle.com>, Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
	Michal Hocko <mhocko@...e.com>, Paolo Bonzini <pbonzini@...hat.com>, "Sean
 Christopherson" <seanjc@...gle.com>, Shuah Khan <shuah@...nel.org>, "Suren
 Baghdasaryan" <surenb@...gle.com>, Vlastimil Babka <vbabka@...e.cz>,
	<linux-kernel@...r.kernel.org>, <kvm@...r.kernel.org>,
	<linux-kselftest@...r.kernel.org>
Subject: Re: [PATCH v3 4/5] guest_memfd: add support for userfaultfd minor
 mode



On 01/12/2025 20:57, Peter Xu wrote:
> On Mon, Dec 01, 2025 at 08:12:38PM +0000, Nikita Kalyazin wrote:
>>
>>
>> On 01/12/2025 18:35, Peter Xu wrote:
>>> On Mon, Dec 01, 2025 at 04:48:22PM +0000, Nikita Kalyazin wrote:
>>>> I believe I found the precise point where we convinced ourselves that minor
>>>> support was sufficient: [1].  If at this moment we don't find that reasoning
>>>> valid anymore, then indeed implementing missing is the only option.
>>>>
>>>> [1] https://lore.kernel.org/kvm/Z9GsIDVYWoV8d8-C@x1.local
>>>
>>> Now after I re-read the discussion, I may have made a wrong statement
>>> there, sorry.  I could have got slightly confused on when the write()
>>> syscall can be involved.
>>>
>>> I agree if you want to get an event when cache missed with the current uffd
>>> definitions and when pre-population is forbidden, then MISSING trap is
>>> required.  That is, with/without the need of UFFDIO_COPY being available.
>>>
>>> Do I understand it right that UFFDIO_COPY is not allowed in your case, but
>>> only write()?
>>
>> No, UFFDIO_COPY would work perfectly fine.  We will still use write()
>> whenever we resolve stage-2 faults as they aren't visible to UFFD.  When a
>> userfault occurs at an offset that already has a page in the cache, we will
>> have to keep using UFFDIO_CONTINUE so it looks like both will be required:
>>
>>   - user mapping major fault -> UFFDIO_COPY (fills the cache and sets up
>> userspace PT)
>>   - user mapping minor fault -> UFFDIO_CONTINUE (only sets up userspace PT)
>>   - stage-2 fault -> write() (only fills the cache)
> 
> Is stage-2 fault about KVM_MEMORY_EXIT_FLAG_USERFAULT, per James's series?

Yes, that's the one ([1]).

[1] 
https://lore.kernel.org/kvm/20250618042424.330664-1-jthoughton@google.com

> 
> It looks fine indeed, but it looks slightly weird then, as you'll have two
> ways to populate the page cache.  Logically here atomicity is indeed not
> needed when you trap both MISSING + MINOR.

I reran the test based on the UFFDIO_COPY prototype I had using your 
series [2], and UFFDIO_COPY is slower than write() to populate 512 MiB: 
237 vs 202 ms (+17%).  Even though UFFDIO_COPY alone is functionally 
sufficient, I would prefer to have an option to use write() where 
possible and only falling back to UFFDIO_COPY for userspace faults to 
have better performance.

[2] 
https://lore.kernel.org/all/7666ee96-6f09-4dc1-8cb2-002a2d2a29cf@amazon.com

> 
>>
>>>
>>> One way that might work this around, is introducing a new UFFD_FEATURE bit
>>> allowing the MINOR registration to trap all pgtable faults, which will
>>> change the MINOR fault semantics.
>>
>> This would equally work for us.  I suppose this MINOR+MAJOR semantics would
>> be more intrusive from the API point of view though.
> 
> Yes it is, it's just that I don't know whether it'll be harder when you
> want to completely support UFFDIO_COPY here, per previous discussions.
> 
> After a 2nd thought, such UFFD_FEATURE is probably not a good design,
> because it essentially means that feature bit will functionally overlap
> with what MISSING trap was trying to do, however duplicating that concept
> in a VMA that was registered as MINOR only.
> 
> Maybe it's possible instead if we allow a module to support MISSING trap,
> but without supporting UFFDIO_COPY ioctl.
> 
> That is, the MISSING events will be properly generated if MISSING traps are
> supported, however the module needs to provide its own way to resolve it if
> UFFDIO_COPY ioctl isn't available.  Gmem is fine in this case as long as
> it'll always be registered with both MISSING+MINOR traps, then resolving
> using write()s would work.

Yes, this would also work for me.  This is almost how it was 
(accidentally) working until this version of the patches.  If this is a 
lighter undertaking compared to implementing UFFDIO_COPY, I'd be happy 
with it too.

> 
> Such would be possible when with something like my v3 previously:
> 
> https://lore.kernel.org/all/20250926211650.525109-1-peterx@redhat.com/#t
> 
> Then gmem needs to declare VM_UFFD_MISSING + VM_UFFD_MINOR in
> uffd_features, but _UFFDIO_CONTINUE only (without _UFFDIO_COPY) in
> uffd_ioctls.
> 
> Since Mike already took this series over, I'll leave that to you all to
> decide.

Thanks for you input, Peter.

> 
> --
> Peter Xu
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ