lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <76e3d5bf-df73-4293-84f6-0d6ddabd0fd7@amazon.com>
Date: Mon, 1 Dec 2025 20:12:38 +0000
From: Nikita Kalyazin <kalyazin@...zon.com>
To: Peter Xu <peterx@...hat.com>
CC: "David Hildenbrand (Red Hat)" <david@...nel.org>, Mike Rapoport
	<rppt@...nel.org>, <linux-mm@...ck.org>, Andrea Arcangeli
	<aarcange@...hat.com>, Andrew Morton <akpm@...ux-foundation.org>, "Axel
 Rasmussen" <axelrasmussen@...gle.com>, Baolin Wang
	<baolin.wang@...ux.alibaba.com>, Hugh Dickins <hughd@...gle.com>, "James
 Houghton" <jthoughton@...gle.com>, "Liam R. Howlett"
	<Liam.Howlett@...cle.com>, Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
	Michal Hocko <mhocko@...e.com>, Paolo Bonzini <pbonzini@...hat.com>, "Sean
 Christopherson" <seanjc@...gle.com>, Shuah Khan <shuah@...nel.org>, "Suren
 Baghdasaryan" <surenb@...gle.com>, Vlastimil Babka <vbabka@...e.cz>,
	<linux-kernel@...r.kernel.org>, <kvm@...r.kernel.org>,
	<linux-kselftest@...r.kernel.org>
Subject: Re: [PATCH v3 4/5] guest_memfd: add support for userfaultfd minor
 mode



On 01/12/2025 18:35, Peter Xu wrote:
> On Mon, Dec 01, 2025 at 04:48:22PM +0000, Nikita Kalyazin wrote:
>> I believe I found the precise point where we convinced ourselves that minor
>> support was sufficient: [1].  If at this moment we don't find that reasoning
>> valid anymore, then indeed implementing missing is the only option.
>>
>> [1] https://lore.kernel.org/kvm/Z9GsIDVYWoV8d8-C@x1.local
> 
> Now after I re-read the discussion, I may have made a wrong statement
> there, sorry.  I could have got slightly confused on when the write()
> syscall can be involved.
> 
> I agree if you want to get an event when cache missed with the current uffd
> definitions and when pre-population is forbidden, then MISSING trap is
> required.  That is, with/without the need of UFFDIO_COPY being available.
> 
> Do I understand it right that UFFDIO_COPY is not allowed in your case, but
> only write()?

No, UFFDIO_COPY would work perfectly fine.  We will still use write() 
whenever we resolve stage-2 faults as they aren't visible to UFFD.  When 
a userfault occurs at an offset that already has a page in the cache, we 
will have to keep using UFFDIO_CONTINUE so it looks like both will be 
required:

  - user mapping major fault -> UFFDIO_COPY (fills the cache and sets up 
userspace PT)
  - user mapping minor fault -> UFFDIO_CONTINUE (only sets up userspace PT)
  - stage-2 fault -> write() (only fills the cache)

> 
> One way that might work this around, is introducing a new UFFD_FEATURE bit
> allowing the MINOR registration to trap all pgtable faults, which will
> change the MINOR fault semantics.

This would equally work for us.  I suppose this MINOR+MAJOR semantics 
would be more intrusive from the API point of view though.

> 
> That'll need some further thoughts, meanwhile we may also want to make sure
> the old shmem/hugetlbfs semantics are kept (e.g. they should fail MINOR
> registers if the new feature bit is enabled in the ctx somehow; or support
> them properly in the codebase).
> 
> Thanks,
> 
> --
> Peter Xu
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ