[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4e5c2904-f628-4391-853e-37b7f0e132e8@amazon.com>
Date: Fri, 26 Jul 2024 17:50:15 +0100
From: Nikita Kalyazin <kalyazin@...zon.com>
To: James Houghton <jthoughton@...gle.com>
CC: Marc Zyngier <maz@...nel.org>, Oliver Upton <oliver.upton@...ux.dev>,
James Morse <james.morse@....com>, Suzuki K Poulose <suzuki.poulose@....com>,
Zenghui Yu <yuzenghui@...wei.com>, Sean Christopherson <seanjc@...gle.com>,
Shuah Khan <shuah@...nel.org>, Peter Xu <peterx@...hat.org>, Axel Rasmussen
<axelrasmussen@...gle.com>, David Matlack <dmatlack@...gle.com>,
<kvm@...r.kernel.org>, <linux-doc@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, <linux-arm-kernel@...ts.infradead.org>,
<kvmarm@...ts.linux.dev>, <roypat@...zon.co.uk>, <kalyazin@...zon.com>,
"Paolo Bonzini" <pbonzini@...hat.com>
Subject: Re: [RFC PATCH 14/18] KVM: Add asynchronous userfaults,
KVM_READ_USERFAULT
Hi James,
On 11/07/2024 00:42, James Houghton wrote:
> It is possible that KVM wants to access a userfault-enabled GFN in a
> path where it is difficult to return out to userspace with the fault
> information. For these cases, add a mechanism for KVM to wait for a GFN
> to not be userfault-enabled.
In this patch series, an asynchronous notification mechanism is used
only in cases "where it is difficult to return out to userspace with the
fault information". However, we (AWS) have a use case where we would
like to be notified asynchronously about _all_ faults. Firecracker can
restore a VM from a memory snapshot where the guest memory is supplied
via a Userfaultfd by a process separate from the VMM itself [1]. While
it looks technically possible for the VMM process to handle exits via
forwarding the faults to the other process, that would require building
a complex userspace protocol on top and likely introduce extra latency
on the critical path. This also implies that a KVM API
(KVM_READ_USERFAULT) is not suitable, because KVM checks that the ioctls
are performed specifically by the VMM process [2]:
if (kvm->mm != current->mm || kvm->vm_dead)
return -EIO;
> The implementation of this mechanism is certain to change before KVM
> Userfault could possibly be merged.
How do you envision resolving faults in userspace? Copying the page in
(provided that userspace mapping of guest_memfd is supported [3]) and
clearing the KVM_MEMORY_ATTRIBUTE_USERFAULT alone do not look
sufficient to resolve the fault because an attempt to copy the page
directly in userspace will trigger a fault on its own and may lead to a
deadlock in the case where the original fault was caused by the VMM. An
interface similar to UFFDIO_COPY is needed that would allocate a page,
copy the content in and update page tables.
[1] Firecracker snapshot restore via UserfaultFD:
https://github.com/firecracker-microvm/firecracker/blob/main/docs/snapshotting/handling-page-faults-on-snapshot-resume.md
[2] KVM ioctl check for the address space:
https://elixir.bootlin.com/linux/v6.10.1/source/virt/kvm/kvm_main.c#L5083
[3] mmap() of guest_memfd:
https://lore.kernel.org/kvm/489d1494-626c-40d9-89ec-4afc4cd0624b@redhat.com/T/#mc944a6fdcd20a35f654c2be99f9c91a117c1bed4
Thanks,
Nikita
Powered by blists - more mailing lists