[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAAYXXYwW9dMMz=tp9kM6P9P29xBMNhgkPA90vX-mEjNaActFiw@mail.gmail.com>
Date: Wed, 23 Mar 2022 13:17:45 -0700
From: Erdem Aktas <erdemaktas@...gle.com>
To: Isaku Yamahata <isaku.yamahata@...il.com>
Cc: "Yamahata, Isaku" <isaku.yamahata@...el.com>,
"open list:KERNEL VIRTUAL MACHINE (KVM)" <kvm@...r.kernel.org>,
open list <linux-kernel@...r.kernel.org>,
Paolo Bonzini <pbonzini@...hat.com>,
Jim Mattson <jmattson@...gle.com>,
Connor Kuehl <ckuehl@...hat.com>,
Sean Christopherson <seanjc@...gle.com>
Subject: Re: [RFC PATCH v5 073/104] KVM: TDX: track LP tdx vcpu run and
teardown vcpus on descroing the guest TD
So the tdh_vp_flush should always succeed while vm is being torn down.
Thanks Isaku for the explanation, and I think it would be great to add
the error message.
-Erdem
On Wed, Mar 23, 2022 at 12:08 PM Isaku Yamahata
<isaku.yamahata@...il.com> wrote:
>
> On Tue, Mar 22, 2022 at 05:54:45PM -0700,
> Erdem Aktas <erdemaktas@...gle.com> wrote:
>
> > On Fri, Mar 4, 2022 at 11:50 AM <isaku.yamahata@...el.com> wrote:
> > > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> > > index a6b1a8ce888d..690298fb99c7 100644
> > > --- a/arch/x86/kvm/vmx/tdx.c
> > > +++ b/arch/x86/kvm/vmx/tdx.c
> > > @@ -48,6 +48,14 @@ struct tdx_capabilities tdx_caps;
> > > static DEFINE_MUTEX(tdx_lock);
> > > static struct mutex *tdx_mng_key_config_lock;
> > >
> > > +/*
> > > + * A per-CPU list of TD vCPUs associated with a given CPU. Used when a CPU
> > > + * is brought down to invoke TDH_VP_FLUSH on the approapriate TD vCPUS.
> > > + * Protected by interrupt mask. This list is manipulated in process context
> > > + * of vcpu and IPI callback. See tdx_flush_vp_on_cpu().
> > > + */
> > > +static DEFINE_PER_CPU(struct list_head, associated_tdvcpus);
> > > +
> > > static u64 hkid_mask __ro_after_init;
> > > static u8 hkid_start_pos __ro_after_init;
> > >
> > > @@ -87,6 +95,8 @@ static inline bool is_td_finalized(struct kvm_tdx *kvm_tdx)
> > >
> > > static inline void tdx_disassociate_vp(struct kvm_vcpu *vcpu)
> > > {
> > > + list_del(&to_tdx(vcpu)->cpu_list);
> > > +
> > > /*
> > > * Ensure tdx->cpu_list is updated is before setting vcpu->cpu to -1,
> > > * otherwise, a different CPU can see vcpu->cpu = -1 and add the vCPU
> > > @@ -97,6 +107,22 @@ static inline void tdx_disassociate_vp(struct kvm_vcpu *vcpu)
> > > vcpu->cpu = -1;
> > > }
> > >
> > > +void tdx_hardware_enable(void)
> > > +{
> > > + INIT_LIST_HEAD(&per_cpu(associated_tdvcpus, raw_smp_processor_id()));
> > > +}
> > > +
> > > +void tdx_hardware_disable(void)
> > > +{
> > > + int cpu = raw_smp_processor_id();
> > > + struct list_head *tdvcpus = &per_cpu(associated_tdvcpus, cpu);
> > > + struct vcpu_tdx *tdx, *tmp;
> > > +
> > > + /* Safe variant needed as tdx_disassociate_vp() deletes the entry. */
> > > + list_for_each_entry_safe(tdx, tmp, tdvcpus, cpu_list)
> > > + tdx_disassociate_vp(&tdx->vcpu);
> > > +}
> > > +
> > > static void tdx_clear_page(unsigned long page)
> > > {
> > > const void *zero_page = (const void *) __va(page_to_phys(ZERO_PAGE(0)));
> > > @@ -230,9 +256,11 @@ void tdx_mmu_prezap(struct kvm *kvm)
> > > struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
> > > cpumask_var_t packages;
> > > bool cpumask_allocated;
> > > + struct kvm_vcpu *vcpu;
> > > u64 err;
> > > int ret;
> > > int i;
> > > + unsigned long j;
> > >
> > > if (!is_hkid_assigned(kvm_tdx))
> > > return;
> > > @@ -248,6 +276,17 @@ void tdx_mmu_prezap(struct kvm *kvm)
> > > return;
> > > }
> > >
> > > + kvm_for_each_vcpu(j, vcpu, kvm)
> > > + tdx_flush_vp_on_cpu(vcpu);
> > > +
> > > + mutex_lock(&tdx_lock);
> > > + err = tdh_mng_vpflushdone(kvm_tdx->tdr.pa);
> >
> > Hi Isaku,
>
> Hi.
>
>
> > I am wondering about the impact of the failures on these functions. Is
> > there any other function which recovers any failures here?
> > When I look at the tdx_flush_vp function, it seems like it can fail
> > due to task migration so tdx_flush_vp_on_cpu might also fail and if it
> > fails, tdh_mng_vpflushdone returns err. Since tdx_vm_teardown does not
> > return any error , how the VMM can free the keyid used in this TD.
> > Will they be forever in "used state"?
> > Also if tdx_vm_teardown fails, the kvm_tdx->hkid is never set to -1
> > which will prevent tdx_vcpu_free to free and reclaim the resources
> > allocated for the vcpu.
>
> mmu_prezap() is called via release callback of mmu notifier when the last mmu
> reference of this process is dropped. It is after all kvm vcpu fd and kvm vm
> fd were closed. vcpu will never run. But we still hold kvm_vcpu structures.
> There is no race between tdh_vp_flush()/tdh_mng_vpflushdone() here and process
> migration. tdh_vp_flush()/tdh_mng_vp_flushdone() should success.
>
> The cpuid check in tdx_flush_vp() is for vcpu_load() which may race with process
> migration.
>
> Anyway what if one of those TDX seamcalls fails? HKID is leaked and will be
> never used because there is no good way to free and use HKID safely. Such
> failure is due to unknown issue and probably a bug.
>
> One mitigation is to add pr_err() when HKID leak happens. I'll add such message
> on next respin.
>
> thanks,
> --
> Isaku Yamahata <isaku.yamahata@...il.com>
Powered by blists - more mailing lists