linux-kernel - Re: [PATCH v19 023/130] KVM: TDX: Initialize the TDX module when loading the KVM intel kernel module

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2383a1e9-ba2b-470f-8807-5f5f2528c7ad@intel.com>
Date: Thu, 18 Apr 2024 11:09:41 +1200
From: "Huang, Kai" <kai.huang@...el.com>
To: Sean Christopherson <seanjc@...gle.com>
CC: "Zhang, Tina" <tina.zhang@...el.com>, "Yuan, Hang" <hang.yuan@...el.com>,
	"Chen, Bo2" <chen.bo@...el.com>, "sagis@...gle.com" <sagis@...gle.com>,
	"isaku.yamahata@...il.com" <isaku.yamahata@...il.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, "Aktas, Erdem"
	<erdemaktas@...gle.com>, "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
	"pbonzini@...hat.com" <pbonzini@...hat.com>, "Yamahata, Isaku"
	<isaku.yamahata@...el.com>, "isaku.yamahata@...ux.intel.com"
	<isaku.yamahata@...ux.intel.com>
Subject: Re: [PATCH v19 023/130] KVM: TDX: Initialize the TDX module when
 loading the KVM intel kernel module



On 18/04/2024 2:40 am, Sean Christopherson wrote:
> On Wed, Apr 17, 2024, Kai Huang wrote:
>> On Tue, 2024-04-16 at 13:58 -0700, Sean Christopherson wrote:
>>> On Fri, Apr 12, 2024, Kai Huang wrote:
>>>> On 12/04/2024 2:03 am, Sean Christopherson wrote:
>>>>> On Thu, Apr 11, 2024, Kai Huang wrote:
>>>>>> I can certainly follow up with this and generate a reviewable patchset if I
>>>>>> can confirm with you that this is what you want?
>>>>>
>>>>> Yes, I think it's the right direction.  I still have minor concerns about VMX
>>>>> being enabled while kvm.ko is loaded, which means that VMXON will _always_ be
>>>>> enabled if KVM is built-in.  But after seeing the complexity that is needed to
>>>>> safely initialize TDX, and after seeing just how much complexity KVM already
>>>>> has because it enables VMX on-demand (I hadn't actually tried removing that code
>>>>> before), I think the cost of that complexity far outweighs the risk of "always"
>>>>> being post-VMXON.
>>>>
>>>> Does always leaving VMXON have any actual damage, given we have emergency
>>>> virtualization shutdown?
>>>
>>> Being post-VMXON increases the risk of kexec() into the kdump kernel failing.
>>> The tradeoffs that we're trying to balance are: is the risk of kexec() failing
>>> due to the complexity of the emergency VMX code higher than the risk of us breaking
>>> things in general due to taking on a ton of complexity to juggle VMXON for TDX?
>>>
>>> After seeing the latest round of TDX code, my opinion is that being post-VMXON
>>> is less risky overall, in no small part because we need that to work anyways for
>>> hosts that are actively running VMs.
>>
>> How about we only keep VMX always on when TDX is enabled?
> 
> Paolo also suggested that forcing VMXON only if TDX is enabled, mostly because
> kvm-intel.ko and kvm-amd.ko may be auto-loaded based on MODULE_DEVICE_TABLE(),
> which in turn causes problems for out-of-tree hypervisors that want control over
> VMX and SVM.
> 
> I'm not opposed to the idea, it's the complexity and messiness I dislike.  E.g.
> the TDX code shouldn't have to deal with CPU hotplug locks, core KVM shouldn't
> need to expose nolock helpers, etc.  And if we're going to make non-trivial
> changes to the core KVM hardware enabling code anyways...
> 
> What about this?  Same basic idea as before, but instead of unconditionally doing
> hardware enabling during module initialization, let TDX do hardware enabling in
> a late_hardware_setup(), and then have KVM x86 ensure virtualization is enabled
> when creating VMs.
> 
> This way, architectures that aren't saddled with out-of-tree hypervisors can do
> the dead simple thing of enabling hardware during their initialization sequence,
> and the TDX code is much more sane, e.g. invoke kvm_x86_enable_virtualization()
> during late_hardware_setup(), and kvm_x86_disable_virtualization() during module
> exit (presumably).

Fine to me, given I am not familiar with other ARCHs, assuming always 
enable virtualization when KVM present is fine to them. :-)

Two questions below:

> +int kvm_x86_enable_virtualization(void)
> +{
> +	int r;
> +
> +	guard(mutex)(&vendor_module_lock);

It's a little bit odd to take the vendor_module_lock mutex.

It is called by kvm_arch_init_vm(), so more reasonablly we should still 
use kvm_lock?

Also, if we invoke kvm_x86_enable_virtualization() from 
kvm_x86_ops->late_hardware_setup(), then IIUC we will deadlock here 
because kvm_x86_vendor_init() already takes the vendor_module_lock?

> +
> +	if (kvm_usage_count++)
> +		return 0;
> +
> +	r = kvm_enable_virtualization();
> +	if (r)
> +		--kvm_usage_count;
> +
> +	return r;
> +}
> +EXPORT_SYMBOL_GPL(kvm_x86_enable_virtualization);
> +

[...]

> +int kvm_enable_virtualization(void)
>   {
> +	int r;
> +
> +	r = cpuhp_setup_state(CPUHP_AP_KVM_ONLINE, "kvm/cpu:online",
> +			      kvm_online_cpu, kvm_offline_cpu);
> +	if (r)
> +		return r;
> +
> +	register_syscore_ops(&kvm_syscore_ops);
> +
> +	/*
> +	 * Manually undo virtualization enabling if the system is going down.
> +	 * If userspace initiated a forced reboot, e.g. reboot -f, then it's
> +	 * possible for an in-flight module load to enable virtualization
> +	 * after syscore_shutdown() is called, i.e. without kvm_shutdown()
> +	 * being invoked.  Note, this relies on system_state being set _before_
> +	 * kvm_shutdown(), e.g. to ensure either kvm_shutdown() is invoked
> +	 * or this CPU observes the impedning shutdown.  Which is why KVM uses
> +	 * a syscore ops hook instead of registering a dedicated reboot
> +	 * notifier (the latter runs before system_state is updated).
> +	 */
> +	if (system_state == SYSTEM_HALT || system_state == SYSTEM_POWER_OFF ||
> +	    system_state == SYSTEM_RESTART) {
> +		unregister_syscore_ops(&kvm_syscore_ops);
> +		cpuhp_remove_state(CPUHP_AP_KVM_ONLINE);
> +		return -EBUSY;
> +	}
> +

Aren't we also supposed to do:

	on_each_cpu(__kvm_enable_virtualization, NULL, 1);

here?

>   	return 0;
>   }
>