[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240814012006.tqxrfb3mu7wfsrqb@yy-desk-7060>
Date: Wed, 14 Aug 2024 09:20:06 +0800
From: Yuan Yao <yuan.yao@...ux.intel.com>
To: Isaku Yamahata <isaku.yamahata@...el.com>
Cc: Rick Edgecombe <rick.p.edgecombe@...el.com>, seanjc@...gle.com,
pbonzini@...hat.com, kvm@...r.kernel.org, kai.huang@...el.com,
isaku.yamahata@...il.com, tony.lindgren@...ux.intel.com,
xiaoyao.li@...el.com, linux-kernel@...r.kernel.org,
Sean Christopherson <sean.j.christopherson@...el.com>
Subject: Re: [PATCH 18/25] KVM: TDX: Do TDX specific vcpu initialization
On Tue, Aug 13, 2024 at 10:21:08AM -0700, Isaku Yamahata wrote:
> On Tue, Aug 13, 2024 at 04:00:09PM +0800,
> Yuan Yao <yuan.yao@...ux.intel.com> wrote:
>
> > > +/* VMM can pass one 64bit auxiliary data to vcpu via RCX for guest BIOS. */
> > > +static int tdx_td_vcpu_init(struct kvm_vcpu *vcpu, u64 vcpu_rcx)
> > > +{
> > > + const struct tdx_sysinfo_module_info *modinfo = &tdx_sysinfo->module_info;
> > > + struct vcpu_tdx *tdx = to_tdx(vcpu);
> > > + unsigned long va;
> > > + int ret, i;
> > > + u64 err;
> > > +
> > > + if (is_td_vcpu_created(tdx))
> > > + return -EINVAL;
> > > +
> > > + /*
> > > + * vcpu_free method frees allocated pages. Avoid partial setup so
> > > + * that the method can't handle it.
> > > + */
> >
> > This looks not that clear, why vcpu_free can't handle it is not explained.
> >
> > Looking the whole function, page already added into TD by
> > SEAMCALL should be cleared before free back to kernel,
> > tdx_vcpu_free() can handle them. Other pages can be freed
> > directly and can't be handled by tdx_vcpu_free() because
> > they're not added into TD. Is this right understanding ?
>
> Yes. If we result in error in the middle of TDX vCPU initialization,
> TDH.MEM.PAGE.RECLAIM() result in error due to TDX module state check.
> TDX module seems to assume that we don't fail in the middle of TDX vCPU
> initialization. Maybe we can add WARN_ON_ONCE() for such cases.
>
>
> > > + ret = -EIO;
> > > + pr_tdx_error(TDH_VP_CREATE, err);
> > > + goto free_tdvpx;
> > > + }
> > > +
> > > + for (i = 0; i < tdx_sysinfo_nr_tdcx_pages(); i++) {
> > > + va = __get_free_page(GFP_KERNEL_ACCOUNT);
> > > + if (!va) {
> > > + ret = -ENOMEM;
> > > + goto free_tdvpx;
> >
> > It's possible that some pages already added into TD by
> > tdh_vp_addcx() below and they won't be handled by
> > tdx_vcpu_free() if goto free_tdvpx here;
>
> Due to TDX TD state check, we can't free partially assigned TDCS pages.
> TDX module seems to assume that TDH.VP.ADDCX() won't fail in the middle.
The already partially added TDCX pages are initialized by
MOVDIR64 with the TD's private HKID in TDX module, the above
'goto free_tdvpx' frees them back to kernel directly w/o
take back the ownership with shared HKID. This violates the
rule that a page's ownership should be taken back with shared
HKID before release to kernel if they were initialized by any
private HKID before.
How about do tdh_vp_addcx() afer allocated all TDCX pages
and give WARN_ON_ONCE() to the return value of
tdh_vp_addcx() if the tdh_vp_addcx() won't fail except some
BUG inside TDX module in our current usage ?
>
>
> > > + else
> > > + err = tdh_vp_init(tdx, vcpu_rcx);
> > > +
> > > + if (KVM_BUG_ON(err, vcpu->kvm)) {
> > > + pr_tdx_error(TDH_VP_INIT, err);
> > > + return -EIO;
> > > + }
> > > +
> > > + vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
> > > + tdx->td_vcpu_created = true;
> > > +
> > > + return 0;
> > > +
> > > +free_tdvpx:
> >
> > How about s/free_tdvpx/free_tdcx
> >
> > In 1.5 TDX spec these pages are all called TDCX pages, and
> > the function context already indicates that we're talking about
> > vcpu's TDCX pages.
>
> Oops, this is left over when tdvpx was converted to tdcs.
>
>
> > > +static int tdx_vcpu_init(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *cmd)
> > > +{
> > > + struct msr_data apic_base_msr;
> > > + struct vcpu_tdx *tdx = to_tdx(vcpu);
> > > + int ret;
> > > +
> > > + if (cmd->flags)
> > > + return -EINVAL;
> > > + if (tdx->initialized)
> > > + return -EINVAL;
> > > +
> > > + /*
> > > + * As TDX requires X2APIC, set local apic mode to X2APIC. User space
> > > + * VMM, e.g. qemu, is required to set CPUID[0x1].ecx.X2APIC=1 by
> > > + * KVM_SET_CPUID2. Otherwise kvm_set_apic_base() will fail.
> > > + */
> > > + apic_base_msr = (struct msr_data) {
> > > + .host_initiated = true,
> > > + .data = APIC_DEFAULT_PHYS_BASE | LAPIC_MODE_X2APIC |
> > > + (kvm_vcpu_is_reset_bsp(vcpu) ? MSR_IA32_APICBASE_BSP : 0),
> > > + };
> > > + if (kvm_set_apic_base(vcpu, &apic_base_msr))
> > > + return -EINVAL;
> > > +
> > > + ret = tdx_td_vcpu_init(vcpu, (u64)cmd->data);
>
> Because we set guest rcx only, we use cmd->data. Can we add reserved area for
> future use similar to struct kvm_tdx_init_vm?
> i.e. introduce something like
> struct kvm_tdx_init_vcpu {u64 rcx; u64 reserved[]; }
> --
> Isaku Yamahata <isaku.yamahata@...el.com>
Powered by blists - more mailing lists