linux-kernel - Re: [PATCH 2/3] KVM: x86: Add support for VMware guest specific hypercalls

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CABgObfaRP6zKNhrO8_atGDLcHs=uvE0aT8cPKnt_vNHHM+8Nxg@mail.gmail.com>
Date: Mon, 4 Nov 2024 23:13:19 +0100
From: Paolo Bonzini <pbonzini@...hat.com>
To: Zack Rusin <zack.rusin@...adcom.com>
Cc: kvm@...r.kernel.org, Doug Covelli <doug.covelli@...adcom.com>, 
	Jonathan Corbet <corbet@....net>, Sean Christopherson <seanjc@...gle.com>, Thomas Gleixner <tglx@...utronix.de>, 
	Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>, 
	Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org, 
	"H. Peter Anvin" <hpa@...or.com>, Shuah Khan <shuah@...nel.org>, Namhyung Kim <namhyung@...nel.org>, 
	Arnaldo Carvalho de Melo <acme@...hat.com>, Isaku Yamahata <isaku.yamahata@...el.com>, 
	Joel Stanley <joel@....id.au>, linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org, 
	linux-kselftest@...r.kernel.org
Subject: Re: [PATCH 2/3] KVM: x86: Add support for VMware guest specific hypercalls

On Wed, Oct 30, 2024 at 4:35 AM Zack Rusin <zack.rusin@...adcom.com> wrote:
>
> VMware products handle hypercalls in userspace. Give KVM the ability
> to run VMware guests unmodified by fowarding all hypercalls to the
> userspace.
>
> Enabling of the KVM_CAP_X86_VMWARE_HYPERCALL_ENABLE capability turns
> the feature on - it's off by default. This allows vmx's built on top
> of KVM to support VMware specific hypercalls.

Hi Zack,

is there a spec of the hypercalls that are supported by userspace? I
would like to understand if there's anything that's best handled in
the kernel.

If we allow forwarding _all_ hypercalls to userspace, then people will
use it for things other than VMware and there goes all hope of
accelerating stuff in the kernel in the future.

So even having _some_ checks in the kernel before going out to
userspace would keep that door open, or at least try.

Patch 1 instead looks good from an API point of view.

Paolo

> Signed-off-by: Zack Rusin <zack.rusin@...adcom.com>
> Cc: Doug Covelli <doug.covelli@...adcom.com>
> Cc: Paolo Bonzini <pbonzini@...hat.com>
> Cc: Jonathan Corbet <corbet@....net>
> Cc: Sean Christopherson <seanjc@...gle.com>
> Cc: Thomas Gleixner <tglx@...utronix.de>
> Cc: Ingo Molnar <mingo@...hat.com>
> Cc: Borislav Petkov <bp@...en8.de>
> Cc: Dave Hansen <dave.hansen@...ux.intel.com>
> Cc: x86@...nel.org
> Cc: "H. Peter Anvin" <hpa@...or.com>
> Cc: Shuah Khan <shuah@...nel.org>
> Cc: Namhyung Kim <namhyung@...nel.org>
> Cc: Arnaldo Carvalho de Melo <acme@...hat.com>
> Cc: Isaku Yamahata <isaku.yamahata@...el.com>
> Cc: Joel Stanley <joel@....id.au>
> Cc: Zack Rusin <zack.rusin@...adcom.com>
> Cc: kvm@...r.kernel.org
> Cc: linux-doc@...r.kernel.org
> Cc: linux-kernel@...r.kernel.org
> Cc: linux-kselftest@...r.kernel.org
> ---
>  Documentation/virt/kvm/api.rst  | 41 +++++++++++++++++++++++++++++----
>  arch/x86/include/asm/kvm_host.h |  1 +
>  arch/x86/kvm/x86.c              | 33 ++++++++++++++++++++++++++
>  include/uapi/linux/kvm.h        |  1 +
>  4 files changed, 72 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 33ef3cc785e4..5a8c7922f64f 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -6601,10 +6601,11 @@ to the byte array.
>  .. note::
>
>        For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_PAPR, KVM_EXIT_XEN,
> -      KVM_EXIT_EPR, KVM_EXIT_X86_RDMSR and KVM_EXIT_X86_WRMSR the corresponding
> -      operations are complete (and guest state is consistent) only after userspace
> -      has re-entered the kernel with KVM_RUN.  The kernel side will first finish
> -      incomplete operations and then check for pending signals.
> +      KVM_EXIT_EPR, KVM_EXIT_HYPERCALL, KVM_EXIT_X86_RDMSR and KVM_EXIT_X86_WRMSR
> +      the corresponding operations are complete (and guest state is consistent)
> +      only after userspace has re-entered the kernel with KVM_RUN. The kernel
> +      side will first finish incomplete operations and then check for pending
> +      signals.
>
>        The pending state of the operation is not preserved in state which is
>        visible to userspace, thus userspace should ensure that the operation is
> @@ -8201,6 +8202,38 @@ default value for it is set via the kvm.enable_vmware_backdoor
>  kernel parameter (false when not set). Must be set before any
>  VCPUs have been created.
>
> +7.38 KVM_CAP_X86_VMWARE_HYPERCALL
> +---------------------------------
> +
> +:Architectures: x86
> +:Parameters: args[0] whether the feature should be enabled or not
> +:Returns: 0 on success.
> +
> +Capability allows userspace to handle hypercalls. When enabled
> +whenever the vcpu has executed a VMCALL(Intel) or a VMMCALL(AMD)
> +instruction kvm will exit to userspace with KVM_EXIT_HYPERCALL.
> +
> +On exit the hypercall structure of the kvm_run structure will
> +look as follows:
> +
> +::
> +   /* KVM_EXIT_HYPERCALL */
> +   struct {
> +      __u64 nr;      // rax
> +      __u64 args[6]; // rbx, rcx, rdx, rsi, rdi, rbp
> +      __u64 ret;     // cpl, whatever userspace
> +                     // sets this to on return will be
> +                     // written to the rax
> +      __u64 flags;   // KVM_EXIT_HYPERCALL_LONG_MODE if
> +                     // the hypercall was executed in
> +                     // 64bit mode, 0 otherwise
> +   } hypercall;
> +
> +Except when running in compatibility mode with VMware hypervisors
> +userspace handling of hypercalls is discouraged. To implement
> +such functionality, use KVM_EXIT_IO (x86) or KVM_EXIT_MMIO
> +(all except s390).
> +
>  8. Other capabilities.
>  ======================
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 7fcf185e337f..7fbb11682517 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1404,6 +1404,7 @@ struct kvm_arch {
>         struct kvm_xen xen;
>  #endif
>         bool vmware_backdoor_enabled;
> +       bool vmware_hypercall_enabled;
>
>         bool backwards_tsc_observed;
>         bool boot_vcpu_runs_old_kvmclock;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index d7071907d6a5..b676c54266e7 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -4689,6 +4689,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>         case KVM_CAP_MEMORY_FAULT_INFO:
>         case KVM_CAP_X86_GUEST_MODE:
>         case KVM_CAP_X86_VMWARE_BACKDOOR:
> +       case KVM_CAP_X86_VMWARE_HYPERCALL:
>                 r = 1;
>                 break;
>         case KVM_CAP_PRE_FAULT_MEMORY:
> @@ -6784,6 +6785,13 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>                 }
>                 mutex_unlock(&kvm->lock);
>                 break;
> +       case KVM_CAP_X86_VMWARE_HYPERCALL:
> +               r = -EINVAL;
> +               if (cap->args[0] & ~1)
> +                       break;
> +               kvm->arch.vmware_hypercall_enabled = cap->args[0];
> +               r = 0;
> +               break;
>         default:
>                 r = -EINVAL;
>                 break;
> @@ -10127,6 +10135,28 @@ static int complete_hypercall_exit(struct kvm_vcpu *vcpu)
>         return kvm_skip_emulated_instruction(vcpu);
>  }
>
> +static int kvm_vmware_hypercall(struct kvm_vcpu *vcpu)
> +{
> +       struct kvm_run *run = vcpu->run;
> +       bool is_64_bit = is_64_bit_hypercall(vcpu);
> +       u64 mask = is_64_bit ? U64_MAX : U32_MAX;
> +
> +       vcpu->run->hypercall.flags = is_64_bit ? KVM_EXIT_HYPERCALL_LONG_MODE : 0;
> +       run->hypercall.nr = kvm_rax_read(vcpu) & mask;
> +       run->hypercall.args[0] = kvm_rbx_read(vcpu) & mask;
> +       run->hypercall.args[1] = kvm_rcx_read(vcpu) & mask;
> +       run->hypercall.args[2] = kvm_rdx_read(vcpu) & mask;
> +       run->hypercall.args[3] = kvm_rsi_read(vcpu) & mask;
> +       run->hypercall.args[4] = kvm_rdi_read(vcpu) & mask;
> +       run->hypercall.args[5] = kvm_rbp_read(vcpu) & mask;
> +       run->hypercall.ret = kvm_x86_call(get_cpl)(vcpu);
> +
> +       run->exit_reason = KVM_EXIT_HYPERCALL;
> +       vcpu->arch.complete_userspace_io = complete_hypercall_exit;
> +
> +       return 0;
> +}
> +
>  unsigned long __kvm_emulate_hypercall(struct kvm_vcpu *vcpu, unsigned long nr,
>                                       unsigned long a0, unsigned long a1,
>                                       unsigned long a2, unsigned long a3,
> @@ -10225,6 +10255,9 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
>         int op_64_bit;
>         int cpl;
>
> +       if (vcpu->kvm->arch.vmware_hypercall_enabled)
> +               return kvm_vmware_hypercall(vcpu);
> +
>         if (kvm_xen_hypercall_enabled(vcpu->kvm))
>                 return kvm_xen_hypercall(vcpu);
>
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index c7b5f1c2ee1c..4c2cc6ed29a0 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -934,6 +934,7 @@ struct kvm_enable_cap {
>  #define KVM_CAP_X86_APIC_BUS_CYCLES_NS 237
>  #define KVM_CAP_X86_GUEST_MODE 238
>  #define KVM_CAP_X86_VMWARE_BACKDOOR 239
> +#define KVM_CAP_X86_VMWARE_HYPERCALL 240
>
>  struct kvm_irq_routing_irqchip {
>         __u32 irqchip;
> --
> 2.43.0
>