linux-kernel - Re: [PATCH v19 023/130] KVM: TDX: Initialize the TDX module when loading the KVM intel kernel module

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5ffd4052-4735-449a-9bee-f42563add778@intel.com>
Date: Thu, 18 Apr 2024 12:47:00 +1200
From: "Huang, Kai" <kai.huang@...el.com>
To: Sean Christopherson <seanjc@...gle.com>
CC: "Zhang, Tina" <tina.zhang@...el.com>, "Yuan, Hang" <hang.yuan@...el.com>,
	"Chen, Bo2" <chen.bo@...el.com>, "sagis@...gle.com" <sagis@...gle.com>,
	"isaku.yamahata@...il.com" <isaku.yamahata@...il.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, "Aktas, Erdem"
	<erdemaktas@...gle.com>, "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
	"pbonzini@...hat.com" <pbonzini@...hat.com>, "Yamahata, Isaku"
	<isaku.yamahata@...el.com>, "isaku.yamahata@...ux.intel.com"
	<isaku.yamahata@...ux.intel.com>
Subject: Re: [PATCH v19 023/130] KVM: TDX: Initialize the TDX module when
 loading the KVM intel kernel module



On 18/04/2024 11:35 am, Sean Christopherson wrote:
> On Thu, Apr 18, 2024, Kai Huang wrote:
>> On 18/04/2024 2:40 am, Sean Christopherson wrote:
>>> This way, architectures that aren't saddled with out-of-tree hypervisors can do
>>> the dead simple thing of enabling hardware during their initialization sequence,
>>> and the TDX code is much more sane, e.g. invoke kvm_x86_enable_virtualization()
>>> during late_hardware_setup(), and kvm_x86_disable_virtualization() during module
>>> exit (presumably).
>>
>> Fine to me, given I am not familiar with other ARCHs, assuming always enable
>> virtualization when KVM present is fine to them. :-)
>>
>> Two questions below:
>>
>>> +int kvm_x86_enable_virtualization(void)
>>> +{
>>> +	int r;
>>> +
>>> +	guard(mutex)(&vendor_module_lock);
>>
>> It's a little bit odd to take the vendor_module_lock mutex.
>>
>> It is called by kvm_arch_init_vm(), so more reasonablly we should still use
>> kvm_lock?
> 
> I think this should take an x86-specific lock, since it's guarding x86-specific
> data.  

OK.  This makes sense.

And vendor_module_lock fits the bill perfectly.  Well, except for the
> name, and I definitely have no objection to renaming it.

No opinion on renaming.  Personally I wouldn't bother to rename.  We can 
add a comment in kvm_x86_enable_virtualization() to explain.  Perhaps in 
the future we just want to change to always enable virtualization for 
x86 too..

> 
>> Also, if we invoke kvm_x86_enable_virtualization() from
>> kvm_x86_ops->late_hardware_setup(), then IIUC we will deadlock here because
>> kvm_x86_vendor_init() already takes the vendor_module_lock?
> 
> Ah, yeah.  Oh, duh.  I think the reason I didn't initially suggest late_hardware_setup()
> is that I was assuming/hoping TDX setup could be done after kvm_x86_vendor_exit().
> E.g. in vt_init() or whatever it gets called:
> 
> 	r = kvm_x86_vendor_exit(...);
> 	if (r)
> 		return r;
> 
> 	if (enable_tdx) {
> 		r = tdx_blah_blah_blah();
> 		if (r)
> 			goto vendor_exit;
> 	}


I assume the reason you introduced the late_hardware_setup() is purely 
because you want to do:

   cpu_emergency_register_virt_callback(kvm_x86_ops.emergency_enable);

after

   kvm_ops_update()?

Anyway, we can also do 'enable_tdx' outside of kvm_x86_vendor_init() as 
above, given it cannot be done in hardware_setup() anyway.

If we do 'enable_tdx' in late_hardware_setup(), we will need a 
kvm_x86_enable_virtualization_nolock(), but that's also not a problem to me.

So which way do you prefer?

Btw, with kvm_x86_virtualization_enable(), it seems the compatibility 
check is lost, which I assume is OK?

Btw2, currently tdx_enable() requires cpus_read_lock() must be called 
prior.  If we do unconditional tdx_cpu_enable() in vt_hardware_enable(), 
then with your proposal IIUC there's no such requirement anymore, 
because no task will be scheduled to the new CPU before it reaches 
CPUHP_AP_ACTIVE.  But now calling cpus_read_lock()/unlock() around 
tdx_enable() also acceptable to me.

[...]

>>
>>> +int kvm_enable_virtualization(void)
>>>    {
>>> +	int r;
>>> +
>>> +	r = cpuhp_setup_state(CPUHP_AP_KVM_ONLINE, "kvm/cpu:online",
>>> +			      kvm_online_cpu, kvm_offline_cpu);
>>> +	if (r)
>>> +		return r;
>>> +
>>> +	register_syscore_ops(&kvm_syscore_ops);
>>> +
>>> +	/*
>>> +	 * Manually undo virtualization enabling if the system is going down.
>>> +	 * If userspace initiated a forced reboot, e.g. reboot -f, then it's
>>> +	 * possible for an in-flight module load to enable virtualization
>>> +	 * after syscore_shutdown() is called, i.e. without kvm_shutdown()
>>> +	 * being invoked.  Note, this relies on system_state being set _before_
>>> +	 * kvm_shutdown(), e.g. to ensure either kvm_shutdown() is invoked
>>> +	 * or this CPU observes the impedning shutdown.  Which is why KVM uses
>>> +	 * a syscore ops hook instead of registering a dedicated reboot
>>> +	 * notifier (the latter runs before system_state is updated).
>>> +	 */
>>> +	if (system_state == SYSTEM_HALT || system_state == SYSTEM_POWER_OFF ||
>>> +	    system_state == SYSTEM_RESTART) {
>>> +		unregister_syscore_ops(&kvm_syscore_ops);
>>> +		cpuhp_remove_state(CPUHP_AP_KVM_ONLINE);
>>> +		return -EBUSY;
>>> +	}
>>> +
>>
>> Aren't we also supposed to do:
>>
>> 	on_each_cpu(__kvm_enable_virtualization, NULL, 1);
>>
>> here?
> 
> No, cpuhp_setup_state() invokes the callback, kvm_online_cpu(), on each CPU.
> I.e. KVM has been doing things the hard way by using cpuhp_setup_state_nocalls().
> That's part of the complexity I would like to get rid of.

Ah, right :-)

Btw, why couldn't we do the 'system_state' check at the very beginning 
of this function?