lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 28 Feb 2023 03:06:32 -0800
From:   Isaku Yamahata <isaku.yamahata@...il.com>
To:     "Huang, Kai" <kai.huang@...el.com>
Cc:     "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "Yamahata, Isaku" <isaku.yamahata@...el.com>,
        "pbonzini@...hat.com" <pbonzini@...hat.com>,
        "Shahar, Sagi" <sagis@...gle.com>,
        "Aktas, Erdem" <erdemaktas@...gle.com>,
        "isaku.yamahata@...il.com" <isaku.yamahata@...il.com>,
        "dmatlack@...gle.com" <dmatlack@...gle.com>,
        "Christopherson,, Sean" <seanjc@...gle.com>
Subject: Re: [PATCH v11 023/113] KVM: TDX: allocate/free TDX vcpu structure

On Thu, Jan 19, 2023 at 12:45:09AM +0000,
"Huang, Kai" <kai.huang@...el.com> wrote:

> On Thu, 2023-01-12 at 08:31 -0800, isaku.yamahata@...el.com wrote:
> > From: Isaku Yamahata <isaku.yamahata@...el.com>
> > 
> > The next step of TDX guest creation is to create vcpu.  Allocate TDX vcpu
> > structures, partially initialize it.  
> > 
> 
> Why partially initialize it?  Shouldn't a better way be either: 1) not
> initialize at all, or; 2) fully initialize? 
> 
> Can you put more _why_ here?
> 
> 
> > Allocate pages of TDX vcpu for the
> > TDX module.  Actual donation TDX vcpu pages to the TDX module is not done
> > yet.
> 
> Also, can you explain _why_ it is not done here?
> 
> > 
> > In the case of the conventional case, cpuid is empty at the initialization.
> > and cpuid is configured after the vcpu initialization.  Because TDX
> > supports only X2APIC mode, cpuid is forcibly initialized to support X2APIC
> > on the vcpu initialization.
> 
> Don't quite understand here.  As you said CPUID entries are configured later in
> KVM_SET_CPUID2, so what's the point of initializing CPUID to support x2api> Are you suggesting KVM_SET_CPUID2 will be somehow rejected for TDX guest, or
> there will be special handling to make sure the CPUID initialized here won't be
> overwritten later?
> 
> Please explain clearly here.

Here is the updated one.

    The next step of TDX guest creation is to create vcpu.  Allocate TDX vcpu
    structures, initialize it that doesn't require TDX SEAMCALL.  TDX specific
    vcpuid initialization will be implemented as independent KVM_TDX_INIT_VCPU
    so that when error occurs it's easy to determine which component has the
    issue, KVM or TDX.
    
    In the case of the conventional case, cpuid is empty at the initialization.
    and cpuid is configured after the vcpu initialization.  Because TDX
    supports only X2APIC mode, cpuid is forcibly set to support X2APIC and APIC
    BASE MSR is forcibly set to X2APIC mode.  The MSR will be read only for
    guest TD.  Because kvm_arch_vcpu_create() also initializes kvm MMU that
    depends on local apic settings.  So x2apic needs to be initialized to
    X2APIC mode by vcpu_reset method before KVM mmu initialization.



> > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> > index 557a609c5147..099f0737a5aa 100644
> > --- a/arch/x86/kvm/vmx/tdx.c
> > +++ b/arch/x86/kvm/vmx/tdx.c
> > @@ -281,6 +281,81 @@ int tdx_vm_init(struct kvm *kvm)
> >  	return 0;
> >  }
> >  
> > +int tdx_vcpu_create(struct kvm_vcpu *vcpu)
> > +{
> > +	struct kvm_cpuid_entry2 *e;
> > +
> > +	/*
> > +	 * On cpu creation, cpuid entry is blank.  Forcibly enable
> > +	 * X2APIC feature to allow X2APIC.
> > +	 * Because vcpu_reset() can't return error, allocation is done here.
> > +	 */
> > +	WARN_ON_ONCE(vcpu->arch.cpuid_entries);
> > +	WARN_ON_ONCE(vcpu->arch.cpuid_nent);
> > +	e = kvmalloc_array(1, sizeof(*e), GFP_KERNEL_ACCOUNT);
> 
> You don't need to use kvmalloc_array() when only allocating one entry.

I'll make it kvmalloc() and add comment on kvmalloc(), not kmalloc().


> > +	if (!e)
> > +		return -ENOMEM;
> > +	*e  = (struct kvm_cpuid_entry2) {
> > +		.function = 1,	/* Features for X2APIC */
> > +		.index = 0,
> > +		.eax = 0,
> > +		.ebx = 0,
> > +		.ecx = 1ULL << 21,	/* X2APIC */
> > +		.edx = 0,
> > +	};
> > +	vcpu->arch.cpuid_entries = e;
> > +	vcpu->arch.cpuid_nent = 1;
> 
> As mentioned above, why doing it here? Won't be this be overwritten later in
> KVM_SET_CPUID2?

Yes, user space VMM can overwrite cpuid[0x1] and APIC base MSR.  But it doesn't
matter because it's a bug of user space VMM. user space VMM has to keep the
consistency of cpuid and MSRs.
Because TDX module virtualizes cpuid[0x1].x2apic to fixed 1, KVM value doesn't
matter after vcpu creation.
Because KVM virtualizes APIC base as read only to guest, cpuid[0x1].x2apic
doesn't matter after vcpu creation as long as user space VMM keeps KVM APIC BASE
value.

I'll add a comment.


> > +
> > +	/* TDX only supports x2APIC, which requires an in-kernel local APIC. */
> > +	if (!vcpu->arch.apic)
> > +		return -EINVAL;
> 
> If this is hit, what happens to the CPUID entry allocated above?  It's
> absolutely not clear here in this patch.

It's memory leak. I'll move the check before memory allocation.


> > +
> > +	fpstate_set_confidential(&vcpu->arch.guest_fpu);
> > +
> > +	vcpu->arch.efer = EFER_SCE | EFER_LME | EFER_LMA | EFER_NX;
> > +
> > +	vcpu->arch.cr0_guest_owned_bits = -1ul;
> > +	vcpu->arch.cr4_guest_owned_bits = -1ul;
> > +
> > +	vcpu->arch.tsc_offset = to_kvm_tdx(vcpu->kvm)->tsc_offset;
> > +	vcpu->arch.l1_tsc_offset = vcpu->arch.tsc_offset;
> > +	vcpu->arch.guest_state_protected =
> > +		!(to_kvm_tdx(vcpu->kvm)->attributes & TDX_TD_ATTRIBUTE_DEBUG);
> > +
> > +	return 0;
> > +}
> > +
> > +void tdx_vcpu_free(struct kvm_vcpu *vcpu)
> > +{
> > +	/* This is stub for now.  More logic will come. */
> > +}
> > +
> > +void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
> > +{
> > +	struct msr_data apic_base_msr;
> > +
> > +	/* TDX doesn't support INIT event. */
> > +	if (WARN_ON_ONCE(init_event))
> > +		goto td_bugged;
> 
> Should we use KVM_BUG_ON()?
>
> Again, it appears this depends on how KVM handles INIT, which is done in a later
> patch far way:
> 
> [PATCH v11 102/113] KVM: TDX: Silently ignore INIT/SIPI
> 
> And there's no material explaining how it is handled in either changelog or
> comment, so to me it's not reviewable.

I'll convert them to KVM_BUG_ON().  With this patch, I'll remove WARN_ON_ONCE()
and add KVM_BUG_ON() with the later patch.


> > +
> > +	/* TDX rquires X2APIC. */
> > +	apic_base_msr.data = APIC_DEFAULT_PHYS_BASE | LAPIC_MODE_X2APIC;
> > +	if (kvm_vcpu_is_reset_bsp(vcpu))
> > +		apic_base_msr.data |= MSR_IA32_APICBASE_BSP;
> > +	apic_base_msr.host_initiated = true;
> > +	if (WARN_ON_ONCE(kvm_set_apic_base(vcpu, &apic_base_msr)))
> > +		goto td_bugged;
> 
> I think we have KVM_BUG_ON()?
> 
> TDX requires a lot more staff then just x2apic, why only x2apic is done here,
> particularly in _this_ patch?

After this callback, kvm mmu initialization follows.  It depends on apic
setting.  X2APIC is only devication from the common logic kvm_lapic_reset().
I'll add a comment.
-- 
Isaku Yamahata <isaku.yamahata@...il.com>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ