linux-kernel - Re: [PATCH v3 2/3] KVM: x86: Use kvzalloc() to allocate VM struct

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5546ad0e36f667a6b426ef47f1f40aee8d83efc9.camel@intel.com>
Date: Tue, 20 May 2025 23:57:32 +0000
From: "Huang, Kai" <kai.huang@...el.com>
To: "seanjc@...gle.com" <seanjc@...gle.com>
CC: "kvm@...r.kernel.org" <kvm@...r.kernel.org>, "pbonzini@...hat.com"
	<pbonzini@...hat.com>, "vipinsh@...gle.com" <vipinsh@...gle.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v3 2/3] KVM: x86: Use kvzalloc() to allocate VM struct

On Tue, 2025-05-20 at 16:11 -0700, Sean Christopherson wrote:
> On Tue, May 20, 2025, Kai Huang wrote:
> > On Mon, 2025-05-19 at 08:39 -0700, Sean Christopherson wrote:
> > > +static int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn,
> > > +					enum pg_level level, kvm_pfn_t pfn)
> > >  {
> > >  	struct page *page = pfn_to_page(pfn);
> > >  	int ret;
> > > @@ -3507,10 +3507,14 @@ int __init tdx_bringup(void)
> > >  	r = __tdx_bringup();
> > >  	if (r) {
> > >  		/*
> > > -		 * Disable TDX only but don't fail to load module if
> > > -		 * the TDX module could not be loaded.  No need to print
> > > -		 * message saying "module is not loaded" because it was
> > > -		 * printed when the first SEAMCALL failed.
> > > +		 * Disable TDX only but don't fail to load module if the TDX
> > > +		 * module could not be loaded.  No need to print message saying
> > > +		 * "module is not loaded" because it was printed when the first
> > > +		 * SEAMCALL failed.  Don't bother unwinding the S-EPT hooks or
> > > +		 * vm_size, as kvm_x86_ops have already been finalized (and are
> > > +		 * intentionally not exported).  The S-EPT code is unreachable,
> > > +		 * and allocating a few more bytes per VM in a should-be-rare
> > > +		 * failure scenario is a non-issue.
> > >  		 */
> > >  		if (r == -ENODEV)
> > >  			goto success_disable_tdx;
> > > @@ -3524,3 +3528,19 @@ int __init tdx_bringup(void)
> > >  	enable_tdx = 0;
> > >  	return 0;
> > >  }
> > > +
> > > +
> > > +void __init tdx_hardware_setup(void)
> > > +{
> > > +	/*
> > > +	 * Note, if the TDX module can't be loaded, KVM TDX support will be
> > > +	 * disabled but KVM will continue loading (see tdx_bringup()).
> > > +	 */
> > 
> > This comment seems a little bit weird to me.  I think what you meant here is the
> > @vm_size and those S-EPT ops are not unwound while TDX cannot be brought up but
> > KVM is still loaded.
> 
> This comment is weird?  Or the one in tdx_bringup() is weird?  
> 

I definitely agree tdx_bringup() is weird :-)

> The sole intent
> of _this_ comment is to clarify that KVM could still end up running load with TDX
> disabled.  
> 

But this behaviour itself doesn't mean anything, e.g., if we export kvm_x86_ops,
we could unwind them.  So without mentioning "those are not unwound", it doesn't
seem useful to me.

But it does have "(see tdx_bringup())" at the end, so OK to me.  I guess I just
wish it could be more verbose.

> The comment about not unwinding S-EPT resides in tdx_bringup(), because
> that's where the actual decision to not reject KVM load and to not undo the setup
> lives.

Right.

> 
> > > +
> > > +	vt_x86_ops.link_external_spt = tdx_sept_link_private_spt;
> > > +	vt_x86_ops.set_external_spte = tdx_sept_set_private_spte;
> > > +	vt_x86_ops.free_external_spt = tdx_sept_free_private_spt;
> > > +	vt_x86_ops.remove_external_spte = tdx_sept_remove_private_spte;
> > > +	vt_x86_ops.protected_apic_has_interrupt = tdx_protected_apic_has_interrupt;
> > > +}
> > > diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
> > > index 51f98443e8a2..ca39a9391db1 100644
> > > --- a/arch/x86/kvm/vmx/tdx.h
> > > +++ b/arch/x86/kvm/vmx/tdx.h
> > > @@ -8,6 +8,7 @@
> > >  #ifdef CONFIG_KVM_INTEL_TDX
> > >  #include "common.h"
> > >  
> > > +void tdx_hardware_setup(void);
> > >  int tdx_bringup(void);
> > >  void tdx_cleanup(void);
> > >  
> > 
> > There's a build error when CONFIG_KVM_INTEL_TDX is off:
> > 
> > vmx/main.c: In function ‘vt_hardware_setup’:
> > vmx/main.c:34:17: error: implicit declaration of function ‘tdx_hardware_setup’;
> > did you mean ‘vmx_hardware_setup’? [-Wimplicit-function-declaration]
> >    34 |                 tdx_hardware_setup();
> >       |                 ^~~~~~~~~~~~~~~~~~
> >       |                 vmx_hardware_setup
> > 
> > .. for which you need a stub for tdx_hardware_setup() when CONFIG_KVM_INTEL_TDX
> > is off.
> 
> Not in kvm-x86/next, commit 907092bf7cbd ("KVM: VMX: Clean up and macrofy x86_ops")
> buried all of vt_hardware_setup() behind CONFIG_KVM_INTEL_TDX=y.

Oh I was using kvm-coco-queue.  Thanks for pointing out.

> 
> > And one more thing:
> > 
> > With the above patch, we still have below code in vt_init():
> > 
> >         /*
> >          * TDX and VMX have different vCPU structures.  Calculate the
> >          * maximum size/align so that kvm_init() can use the larger
> >          * values to create the kmem_vcpu_cache.
> >          */
> >         vcpu_size = sizeof(struct vcpu_vmx);
> >         vcpu_align = __alignof__(struct vcpu_vmx);
> >         if (enable_tdx) {
> >                 vcpu_size = max_t(unsigned, vcpu_size,
> >                                 sizeof(struct vcpu_tdx));
> >                 vcpu_align = max_t(unsigned, vcpu_align,
> >                                 __alignof__(struct vcpu_tdx));
> >                 kvm_caps.supported_vm_types |= BIT(KVM_X86_TDX_VM);
> >         }
> > 
> > It's kinda ugly too IMHO.
> > 
> > Since we already have @vm_size in kvm_x86_ops, how about also adding vcpu_size
> > and vcpu_align to it?  Then they can be treated in the same way as vm_size for
> > TDX.
> > 
> > They are not needed for SVM, but it doesn't hurt that much?
> 
> I'd rather not.  vt_init() already needs to be aware of TDX, e.g. to call into
> tdx_bringup() in the first place.  Shoving state into kvm_x86_ops that is only
> ever used in vt_init() (an __init function at that) isn't a net positive.
> 
> Putting the fields in kvm_x86_init_ops would be better, but I still don't think
> the complexity and indirection is justified.  Bleeding gory TDX details into the
> common code is undesirable, but I don't see the size of vcpu_tdx or the fact that
> enable_tdx==true means KVM_X86_TDX_VM is supported as being information with
> hiding.
> 
> kvm_x86_ops.vm_size is a means to an end.  E.g. if kvm_vcpu_cache didn't exist,
> then we'd probably want/need kvm_x86_ops.vcpu_size, but it does exist, so...

Sure.  Thanks for clarification.