linux-kernel - Re: [PATCH 3/5] KVM: Add a common kvm_run flag to communicate an exit needs completion

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87ikqlr4vo.wl-maz@kernel.org>
Date: Sat, 11 Jan 2025 11:01:31 +0000
From: Marc Zyngier <maz@...nel.org>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>,
	Oliver Upton <oliver.upton@...ux.dev>,
	Michael Ellerman <mpe@...erman.id.au>,
	kvm@...r.kernel.org,
	linux-arm-kernel@...ts.infradead.org,
	kvmarm@...ts.linux.dev,
	linuxppc-dev@...ts.ozlabs.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 3/5] KVM: Add a common kvm_run flag to communicate an exit needs completion

On Sat, 11 Jan 2025 01:24:48 +0000,
Sean Christopherson <seanjc@...gle.com> wrote:
> 
> Add a kvm_run flag, KVM_RUN_NEEDS_COMPLETION, to communicate to userspace
> that KVM_RUN needs to be re-executed prior to save/restore in order to
> complete the instruction/operation that triggered the userspace exit.
> 
> KVM's current approach of adding notes in the Documentation is beyond
> brittle, e.g. there is at least one known case where a KVM developer added
> a new userspace exit type, and then that same developer forgot to handle
> completion when adding userspace support.

Is this going to fix anything? If they couldn't be bothered to read
the documentation, let alone update it, how is that going to be
improved by extra rules and regulations?

I don't see how someone ignoring the documented behaviour of a given
exit reason is, all of a sudden, have an epiphany and take a *new*
flag into account.

> 
> On x86, there are multiple exits that need completion, but they are all
> conveniently funneled through a single callback, i.e. in theory, this is a
> one-time thing for KVM x86 (and other architectures could follow suit with
> additional refactoring).
> 
> Signed-off-by: Sean Christopherson <seanjc@...gle.com>
> ---
>  Documentation/virt/kvm/api.rst    | 48 ++++++++++++++++++++++---------
>  arch/powerpc/kvm/book3s_emulate.c |  1 +
>  arch/powerpc/kvm/book3s_hv.c      |  1 +
>  arch/powerpc/kvm/book3s_pr.c      |  2 ++
>  arch/powerpc/kvm/booke.c          |  1 +
>  arch/x86/include/uapi/asm/kvm.h   |  7 +++--
>  arch/x86/kvm/x86.c                |  2 ++
>  include/uapi/linux/kvm.h          |  3 ++
>  virt/kvm/kvm_main.c               |  1 +
>  9 files changed, 49 insertions(+), 17 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index c92c8d4e8779..8e172675d8d6 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -6505,7 +6505,7 @@ local APIC is not used.
>  
>  	__u16 flags;
>  
> -More architecture-specific flags detailing state of the VCPU that may
> +Common and architecture-specific flags detailing state of the VCPU that may
>  affect the device's behavior. Current defined flags::
>  
>    /* x86, set if the VCPU is in system management mode */
> @@ -6518,6 +6518,8 @@ affect the device's behavior. Current defined flags::
>    /* arm64, set for KVM_EXIT_DEBUG */
>    #define KVM_DEBUG_ARCH_HSR_HIGH_VALID  (1 << 0)
>  
> +  /* all architectures, set when the exit needs completion (via KVM_RUN) */
> +  #define KVM_RUN_NEEDS_COMPLETION  (1 << 15)
>  ::
>  
>  	/* in (pre_kvm_run), out (post_kvm_run) */
> @@ -6632,19 +6634,10 @@ requires a guest to interact with host userspace.
>  
>  .. note::
>  
> -      For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_PAPR, KVM_EXIT_XEN,
> -      KVM_EXIT_EPR, KVM_EXIT_X86_RDMSR, KVM_EXIT_X86_WRMSR, and KVM_EXIT_HYPERCALL
> -      the corresponding operations are complete (and guest state is consistent)
> -      only after userspace has re-entered the kernel with KVM_RUN.  The kernel
> -      side will first finish incomplete operations and then check for pending
> -      signals.
> +      For some exits, userspace must re-enter the kernel with KVM_RUN to
> +      complete the exit and ensure guest state is consistent.
>  
> -      The pending state of the operation is not preserved in state which is
> -      visible to userspace, thus userspace should ensure that the operation is
> -      completed before performing a live migration.  Userspace can re-enter the
> -      guest with an unmasked signal pending or with the immediate_exit field set
> -      to complete pending operations without allowing any further instructions
> -      to be executed.
> +      See KVM_CAP_NEEDS_COMPLETION for details.
>  
>  ::
>  
> @@ -8239,7 +8232,7 @@ Note: Userspace is responsible for correctly configuring CPUID 0x15, a.k.a. the
>  core crystal clock frequency, if a non-zero CPUID 0x15 is exposed to the guest.
>  
>  7.36 KVM_CAP_X86_GUEST_MODE
> -------------------------------
> +---------------------------
>  
>  :Architectures: x86
>  :Returns: Informational only, -EINVAL on direct KVM_ENABLE_CAP.
> @@ -8252,6 +8245,33 @@ KVM exits with the register state of either the L1 or L2 guest
>  depending on which executed at the time of an exit. Userspace must
>  take care to differentiate between these cases.
>  
> +7.37 KVM_CAP_NEEDS_COMPLETION
> +-----------------------------
> +
> +:Architectures: all
> +:Returns: Informational only, -EINVAL on direct KVM_ENABLE_CAP.
> +
> +The presence of this capability indicates that KVM_RUN will set
> +KVM_RUN_NEEDS_COMPLETION in kvm_run.flags if KVM requires userspace to re-enter
> +the kernel KVM_RUN to complete the exit.
> +
> +For select exits, userspace must re-enter the kernel with KVM_RUN to complete
> +the corresponding operation, only after which is guest state guaranteed to be
> +consistent.  On such a KVM_RUN, the kernel side will first finish incomplete
> +operations and then check for pending signals.
> +
> +The pending state of the operation for such exits is not preserved in state
> +which is visible to userspace, thus userspace should ensure that the operation
> +is completed before performing state save/restore, e.g. for live migration.
> +Userspace can re-enter the guest with an unmasked signal pending or with the
> +immediate_exit field set to complete pending operations without allowing any
> +further instructions to be executed.
> +
> +Without KVM_CAP_NEEDS_COMPLETION, KVM_RUN_NEEDS_COMPLETION will never be set
> +and userspace must assume that exits of type KVM_EXIT_IO, KVM_EXIT_MMIO,
> +KVM_EXIT_OSI, KVM_EXIT_PAPR, KVM_EXIT_XEN, KVM_EXIT_EPR, KVM_EXIT_X86_RDMSR,
> +KVM_EXIT_X86_WRMSR, and KVM_EXIT_HYPERCALL require completion.

So once you advertise KVM_CAP_NEEDS_COMPLETION, the completion flag
must be present for all of these exits, right? And from what I can
tell, this capability is unconditionally advertised.

Yet, you don't amend arm64 to publish that flag. Not that I think this
causes any issue (even if you save the state at that point without
reentering the guest, it will be still be consistent), but that
directly contradicts the documentation (isn't that ironic? ;-).

Or is your intent to *relax* the requirements on arm64 (and anything
else but x86 and POWER)?

	M.

-- 
Without deviation from the norm, progress is not possible.