linux-kernel - Re: [PATCH v9 016/105] KVM: TDX: create/destroy VM structure

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20221013085554.GA2756200@ls.amr.corp.intel.com>
Date:   Thu, 13 Oct 2022 01:55:54 -0700
From:   Isaku Yamahata <isaku.yamahata@...il.com>
To:     Sagi Shahar <sagis@...gle.com>
Cc:     isaku.yamahata@...el.com, kvm@...r.kernel.org,
        linux-kernel@...r.kernel.org, isaku.yamahata@...il.com,
        Paolo Bonzini <pbonzini@...hat.com>, erdemaktas@...gle.com,
        Sean Christopherson <seanjc@...gle.com>,
        Sean Christopherson <sean.j.christopherson@...el.com>,
        Kai Huang <kai.huang@...el.com>
Subject: Re: [PATCH v9 016/105] KVM: TDX: create/destroy VM structure

On Wed, Oct 12, 2022 at 03:30:26PM -0700,
Sagi Shahar <sagis@...gle.com> wrote:


> > +int tdx_vm_init(struct kvm *kvm)
> > +{
> > +       struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
> > +       cpumask_var_t packages;
> > +       int ret, i;
> > +       u64 err;
> > +
> > +       /* vCPUs can't be created until after KVM_TDX_INIT_VM. */
> > +       kvm->max_vcpus = 0;
> 
> The fact that vCPUs can't be created until KVM_TDX_INIT_VM is called
> will make it difficult to implement intra host migration. See longer
> discussion below.
...
> Me, Sean and Isaku had a short discussion offline regarding the
> interaction between the proposed API in this patch and intra-host
> migration. To summarize:
> 
> For intra-host migration you generally want the destination VM to be
> initialized including the right number of vCPUs before you migrate the
> source VM state into it.
> The proposed API makes it difficult since it forces the destination VM
> to call KVM_TDX_INIT_VM before creating vCPUs which initializes TDX
> state and allocate a new hkid for the destination VM which would never
> be used. This can create a resource limitation on migrating VMs where
> there shouldn't be one.
> 
> To solve this issue there are 2 main proposed changes to the API:
> 
> 1. Add a new API based on ioctl(KVM_ENABLE_CAP) to let userspace
> modify the max number of vcpus:
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 43a6a7efc6ec..6055098b025b 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -6278,6 +6278,18 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>                 }
>                 mutex_unlock(&kvm->lock);
>                 break;
> +       case KVM_CAP_MAX_VCPUS:
> +               r = -EINVAL;
> +               if (cap->args[0] > KVM_MAX_VCPUS)
> +                       break;
> +
> +               mutex_lock(&kvm->lock);
> +               if (!kvm->created_vcpus) {
> +                       kvm->max_vcpus = cap->args[0];
> +                       r = 0;
> +               }
> +               mutex_unlock(&kvm->lock);
> +               break;
>         case KVM_CAP_MAX_VCPU_ID:
>                 r = -EINVAL;
>                 if (cap->args[0] > KVM_MAX_VCPU_IDS)
> 
> 2. Modify the existing API such that max_vcpus will be set to
> KVM_MAX_VCPUS like in regular VMs and during KVM_TDX_INIT_VM, if the
> user created more vCPUs than the number specified, KVM_TDX_INIT_VM
> will fail.
> 
> For option (1), there are some possible variations:
> 1.a. Do we keep the max_vcpus argument in KVM_TDX_INIT_VM? If so, we
> need to check if max_vcpus matches the number of max_vcpus already set
> and fail otherwise.
> 1.b. Do we require KVM_ENABLE_CAP_VM(KVM_CAP_MAX_VCPUS) to be called?
> Theoretically, we can set max_vcpus to the KVM default KVM_MAX_VCPUS
> and allow the user to change it as long as vcpus hasn't been created.
> If KVM_ENABLE_CAP_VM(KVM_CAP_MAX_VCPUS), the behavior will remain the
> same as regular VMs right now.
> 
> In my opinion, the cleanest solution would be option 1 (new
> KVM_CAP_MAX_VCPUS API) while removing the max_vcpus argument from
> KVM_TDX_INIT_VM and setting the initial max_vcpus to KVM_MAX_VCPUS and
> not requiring the new ioctl to be called unless userspace wants to
> specifically limit the number of vcpus. In that case,
> KVM_CAP_MAX_VCPUS can be called at any time until vcpus are created.

Regarding to KVM_CAP_MAX_CPUS vs KVM_TDX_INIT_VM, KVM_CAP_MAX_CPUS is more
generic, KVM_CAP_MAX_CPUS  would be better.  This follows tsc frequency.

If option (1) is adapted, the logic should go to the common code, i.e. under
linux/virt/kvm/, because there is nothing specific to x86.  I don't see any use
case other than TDX, though.
-- 
Isaku Yamahata <isaku.yamahata@...il.com>