lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cd354fc0-e500-472d-ac33-0bc43c0d898f@amazon.com>
Date: Tue, 2 Dec 2025 15:59:49 +0000
From: Nikita Kalyazin <kalyazin@...zon.com>
To: Peter Xu <peterx@...hat.com>
CC: "David Hildenbrand (Red Hat)" <david@...nel.org>, Mike Rapoport
	<rppt@...nel.org>, <linux-mm@...ck.org>, Andrea Arcangeli
	<aarcange@...hat.com>, Andrew Morton <akpm@...ux-foundation.org>, "Axel
 Rasmussen" <axelrasmussen@...gle.com>, Baolin Wang
	<baolin.wang@...ux.alibaba.com>, Hugh Dickins <hughd@...gle.com>, "James
 Houghton" <jthoughton@...gle.com>, "Liam R. Howlett"
	<Liam.Howlett@...cle.com>, Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
	Michal Hocko <mhocko@...e.com>, Paolo Bonzini <pbonzini@...hat.com>, "Sean
 Christopherson" <seanjc@...gle.com>, Shuah Khan <shuah@...nel.org>, "Suren
 Baghdasaryan" <surenb@...gle.com>, Vlastimil Babka <vbabka@...e.cz>,
	<linux-kernel@...r.kernel.org>, <kvm@...r.kernel.org>,
	<linux-kselftest@...r.kernel.org>
Subject: Re: [PATCH v3 4/5] guest_memfd: add support for userfaultfd minor
 mode




On 02/12/2025 15:36, Peter Xu wrote:
> On Tue, Dec 02, 2025 at 11:50:31AM +0000, Nikita Kalyazin wrote:
>>> It looks fine indeed, but it looks slightly weird then, as you'll have two
>>> ways to populate the page cache.  Logically here atomicity is indeed not
>>> needed when you trap both MISSING + MINOR.
>>
>> I reran the test based on the UFFDIO_COPY prototype I had using your series
>> [2], and UFFDIO_COPY is slower than write() to populate 512 MiB: 237 vs 202
>> ms (+17%).  Even though UFFDIO_COPY alone is functionally sufficient, I
>> would prefer to have an option to use write() where possible and only
>> falling back to UFFDIO_COPY for userspace faults to have better performance.
> 
> Yes, write() should be fine.
> 
> Especially to gmem, I guess write() support is needed when VMAs cannot be
> mapped at all in strict CoCo context, so it needs to be available one way
> or another.

write() is supposed to be supported only for shared memory, ie 
accessible to the host.  AFAIK private memory should be populated via 
other mechanisms.

> 
> IIUC it's because UFFDIO_COPY (or memcpy(), I recall you used to test that
> instead) will involve pgtable operations.
Yes, for memcpy() it's even worse because it triggers VMA faults for 
every page.  UFFDIO_COPY's overhead is lower because the only extra 
thing it does compared to write() is installing user PTs.

> instead) will involve pgtable operations.  So I wonder if the VMA mapping
> the gmem will still be accessed at some point later (either private->share
> convertable ones for device DMAs for CoCo, or fully shared non-CoCo use
> case), then the pgtable overhead will happen later for a write()-styled
> fault resolution.

At least in Firecracker use case, only pages that are related to PV 
devices are going to get accessed by the VMM via user PTs (such as 
virtio queues and buffers).  The majority of pages are only touched by 
vCPUs via stage-2 mappings and are never accessed via user PTs.

> 
>  From that POV, above number makes sense.
> 
> Thanks for the extra testing results.
> 
>>
>> [2]
>> https://lore.kernel.org/all/7666ee96-6f09-4dc1-8cb2-002a2d2a29cf@amazon.com
> 
> --
> Peter Xu
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ