linux-kernel - Re: [RFC PATCH 14/18] KVM: Add asynchronous userfaults, KVM_READ

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4e5c2904-f628-4391-853e-37b7f0e132e8@amazon.com>
Date: Fri, 26 Jul 2024 17:50:15 +0100
From: Nikita Kalyazin <kalyazin@...zon.com>
To: James Houghton <jthoughton@...gle.com>
CC: Marc Zyngier <maz@...nel.org>, Oliver Upton <oliver.upton@...ux.dev>,
	James Morse <james.morse@....com>, Suzuki K Poulose <suzuki.poulose@....com>,
	Zenghui Yu <yuzenghui@...wei.com>, Sean Christopherson <seanjc@...gle.com>,
	Shuah Khan <shuah@...nel.org>, Peter Xu <peterx@...hat.org>, Axel Rasmussen
	<axelrasmussen@...gle.com>, David Matlack <dmatlack@...gle.com>,
	<kvm@...r.kernel.org>, <linux-doc@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>, <linux-arm-kernel@...ts.infradead.org>,
	<kvmarm@...ts.linux.dev>, <roypat@...zon.co.uk>, <kalyazin@...zon.com>,
	"Paolo Bonzini" <pbonzini@...hat.com>
Subject: Re: [RFC PATCH 14/18] KVM: Add asynchronous userfaults,
 KVM_READ_USERFAULT

Hi James,

On 11/07/2024 00:42, James Houghton wrote:
> It is possible that KVM wants to access a userfault-enabled GFN in a
> path where it is difficult to return out to userspace with the fault
> information. For these cases, add a mechanism for KVM to wait for a GFN
> to not be userfault-enabled.
In this patch series, an asynchronous notification mechanism is used 
only in cases "where it is difficult to return out to userspace with the 
fault information". However, we (AWS) have a use case where we would 
like to be notified asynchronously about _all_ faults. Firecracker can 
restore a VM from a memory snapshot where the guest memory is supplied 
via a Userfaultfd by a process separate from the VMM itself [1]. While 
it looks technically possible for the VMM process to handle exits via 
forwarding the faults to the other process, that would require building 
a complex userspace protocol on top and likely introduce extra latency 
on the critical path. This also implies that a KVM API 
(KVM_READ_USERFAULT) is not suitable, because KVM checks that the ioctls 
are performed specifically by the VMM process [2]:
	if (kvm->mm != current->mm || kvm->vm_dead)
		return -EIO;

 > The implementation of this mechanism is certain to change before KVM
 > Userfault could possibly be merged.
How do you envision resolving faults in userspace? Copying the page in 
(provided that userspace mapping of guest_memfd is supported [3]) and 
clearing the KVM_MEMORY_ATTRIBUTE_USERFAULT alone do not look 
sufficient to resolve the fault because an attempt to copy the page 
directly in userspace will trigger a fault on its own and may lead to a 
deadlock in the case where the original fault was caused by the VMM. An 
interface similar to UFFDIO_COPY is needed that would allocate a page, 
copy the content in and update page tables.

[1] Firecracker snapshot restore via UserfaultFD: 
https://github.com/firecracker-microvm/firecracker/blob/main/docs/snapshotting/handling-page-faults-on-snapshot-resume.md
[2] KVM ioctl check for the address space: 
https://elixir.bootlin.com/linux/v6.10.1/source/virt/kvm/kvm_main.c#L5083
[3] mmap() of guest_memfd: 
https://lore.kernel.org/kvm/489d1494-626c-40d9-89ec-4afc4cd0624b@redhat.com/T/#mc944a6fdcd20a35f654c2be99f9c91a117c1bed4

Thanks,
Nikita