linux-kernel - Re: [RFC PATCH v5 073/104] KVM: TDX: track LP tdx vcpu run and teardown vcpus on descroing the guest TD

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20220323190812.GH1964605@ls.amr.corp.intel.com>
Date:   Wed, 23 Mar 2022 12:08:12 -0700
From:   Isaku Yamahata <isaku.yamahata@...il.com>
To:     Erdem Aktas <erdemaktas@...gle.com>
Cc:     "Yamahata, Isaku" <isaku.yamahata@...el.com>,
        "open list:KERNEL VIRTUAL MACHINE (KVM)" <kvm@...r.kernel.org>,
        open list <linux-kernel@...r.kernel.org>,
        isaku.yamahata@...il.com, Paolo Bonzini <pbonzini@...hat.com>,
        Jim Mattson <jmattson@...gle.com>,
        Connor Kuehl <ckuehl@...hat.com>,
        Sean Christopherson <seanjc@...gle.com>
Subject: Re: [RFC PATCH v5 073/104] KVM: TDX: track LP tdx vcpu run and
 teardown vcpus on descroing the guest TD

On Tue, Mar 22, 2022 at 05:54:45PM -0700,
Erdem Aktas <erdemaktas@...gle.com> wrote:

> On Fri, Mar 4, 2022 at 11:50 AM <isaku.yamahata@...el.com> wrote:
> > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> > index a6b1a8ce888d..690298fb99c7 100644
> > --- a/arch/x86/kvm/vmx/tdx.c
> > +++ b/arch/x86/kvm/vmx/tdx.c
> > @@ -48,6 +48,14 @@ struct tdx_capabilities tdx_caps;
> >  static DEFINE_MUTEX(tdx_lock);
> >  static struct mutex *tdx_mng_key_config_lock;
> >
> > +/*
> > + * A per-CPU list of TD vCPUs associated with a given CPU.  Used when a CPU
> > + * is brought down to invoke TDH_VP_FLUSH on the approapriate TD vCPUS.
> > + * Protected by interrupt mask.  This list is manipulated in process context
> > + * of vcpu and IPI callback.  See tdx_flush_vp_on_cpu().
> > + */
> > +static DEFINE_PER_CPU(struct list_head, associated_tdvcpus);
> > +
> >  static u64 hkid_mask __ro_after_init;
> >  static u8 hkid_start_pos __ro_after_init;
> >
> > @@ -87,6 +95,8 @@ static inline bool is_td_finalized(struct kvm_tdx *kvm_tdx)
> >
> >  static inline void tdx_disassociate_vp(struct kvm_vcpu *vcpu)
> >  {
> > +       list_del(&to_tdx(vcpu)->cpu_list);
> > +
> >         /*
> >          * Ensure tdx->cpu_list is updated is before setting vcpu->cpu to -1,
> >          * otherwise, a different CPU can see vcpu->cpu = -1 and add the vCPU
> > @@ -97,6 +107,22 @@ static inline void tdx_disassociate_vp(struct kvm_vcpu *vcpu)
> >         vcpu->cpu = -1;
> >  }
> >
> > +void tdx_hardware_enable(void)
> > +{
> > +       INIT_LIST_HEAD(&per_cpu(associated_tdvcpus, raw_smp_processor_id()));
> > +}
> > +
> > +void tdx_hardware_disable(void)
> > +{
> > +       int cpu = raw_smp_processor_id();
> > +       struct list_head *tdvcpus = &per_cpu(associated_tdvcpus, cpu);
> > +       struct vcpu_tdx *tdx, *tmp;
> > +
> > +       /* Safe variant needed as tdx_disassociate_vp() deletes the entry. */
> > +       list_for_each_entry_safe(tdx, tmp, tdvcpus, cpu_list)
> > +               tdx_disassociate_vp(&tdx->vcpu);
> > +}
> > +
> >  static void tdx_clear_page(unsigned long page)
> >  {
> >         const void *zero_page = (const void *) __va(page_to_phys(ZERO_PAGE(0)));
> > @@ -230,9 +256,11 @@ void tdx_mmu_prezap(struct kvm *kvm)
> >         struct kvm_tdx *kvm_tdx = to_kvm_tdx(kvm);
> >         cpumask_var_t packages;
> >         bool cpumask_allocated;
> > +       struct kvm_vcpu *vcpu;
> >         u64 err;
> >         int ret;
> >         int i;
> > +       unsigned long j;
> >
> >         if (!is_hkid_assigned(kvm_tdx))
> >                 return;
> > @@ -248,6 +276,17 @@ void tdx_mmu_prezap(struct kvm *kvm)
> >                 return;
> >         }
> >
> > +       kvm_for_each_vcpu(j, vcpu, kvm)
> > +               tdx_flush_vp_on_cpu(vcpu);
> > +
> > +       mutex_lock(&tdx_lock);
> > +       err = tdh_mng_vpflushdone(kvm_tdx->tdr.pa);
> 
> Hi Isaku,

Hi.


> I am wondering about the impact of the failures on these functions. Is
> there any other function which recovers any failures here?
> When I look at the tdx_flush_vp function, it seems like it can fail
> due to task migration so tdx_flush_vp_on_cpu might also fail and if it
> fails, tdh_mng_vpflushdone returns err. Since tdx_vm_teardown does not
> return any error , how the VMM can free the keyid used in this TD.
> Will they be forever in "used state"?
> Also if tdx_vm_teardown fails, the kvm_tdx->hkid is never set to -1
> which will prevent tdx_vcpu_free to free and reclaim the resources
> allocated for the vcpu.

mmu_prezap() is called via release callback of mmu notifier when the last mmu
reference of this process is dropped.  It is after all kvm vcpu fd and kvm vm
fd were closed.  vcpu will never run.  But we still hold kvm_vcpu structures.
There is no race between tdh_vp_flush()/tdh_mng_vpflushdone() here and process
migration.  tdh_vp_flush()/tdh_mng_vp_flushdone() should success.

The cpuid check in tdx_flush_vp() is for vcpu_load() which may race with process
migration. 

Anyway what if one of those TDX seamcalls fails?  HKID is leaked and will be
never used because there is no good way to free and use HKID safely.  Such
failure is due to unknown issue and probably a bug.

One mitigation is to add pr_err() when HKID leak happens.  I'll add such message
on next respin.

thanks,
-- 
Isaku Yamahata <isaku.yamahata@...il.com>