lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Z9NeTQsn4xwTtU06@x1.local>
Date: Thu, 13 Mar 2025 18:38:05 -0400
From: Peter Xu <peterx@...hat.com>
To: Nikita Kalyazin <kalyazin@...zon.com>
Cc: James Houghton <jthoughton@...gle.com>, akpm@...ux-foundation.org,
	pbonzini@...hat.com, shuah@...nel.org, kvm@...r.kernel.org,
	linux-kselftest@...r.kernel.org, linux-kernel@...r.kernel.org,
	linux-mm@...ck.org, lorenzo.stoakes@...cle.com, david@...hat.com,
	ryan.roberts@....com, quic_eberman@...cinc.com, graf@...zon.de,
	jgowans@...zon.com, roypat@...zon.co.uk, derekmn@...zon.com,
	nsaenz@...zon.es, xmarcalx@...zon.com
Subject: Re: [RFC PATCH 0/5] KVM: guest_memfd: support for uffd missing

On Thu, Mar 13, 2025 at 10:13:23PM +0000, Nikita Kalyazin wrote:
> Yes, that's right, mmap() + memcpy() is functionally sufficient. write() is
> an optimisation.  Most of the pages in guest_memfd are only ever accessed by
> the vCPU (not userspace) via TDP (stage-2 pagetables) so they don't need
> userspace pagetables set up.  By using write() we can avoid VMA faults,
> installing corresponding PTEs and double page initialisation we discussed
> earlier.  The optimised path only contains pagecache population via write().
> Even TDP faults can be avoided if using KVM prefaulting API [1].
> 
> [1] https://docs.kernel.org/virt/kvm/api.html#kvm-pre-fault-memory

Could you elaborate why VMA faults matters in perf?

If we're talking about postcopy-like migrations on top of KVM guest-memfd,
IIUC the VMAs can be pre-faulted too just like the TDP pgtables, e.g. with
MADV_POPULATE_WRITE.

Normally, AFAIU userapp optimizes IOs the other way round.. to change
write()s into mmap()s, which at least avoids one round of copy.

For postcopy using minor traps (and since guest-memfd is always shared and
non-private..), it's also possible to feed the mmap()ed VAs to NIC as
buffers (e.g. in recvmsg(), for example, as part of iovec[]), and as long
as the mmap()ed ranges are not registered by KVM memslots, there's no
concern on non-atomic copy.

Thanks,

-- 
Peter Xu


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ