lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 19 Apr 2024 11:09:30 +1200
From: "Huang, Kai" <kai.huang@...el.com>
To: Sean Christopherson <seanjc@...gle.com>
CC: Tina Zhang <tina.zhang@...el.com>, Hang Yuan <hang.yuan@...el.com>, "Bo2
 Chen" <chen.bo@...el.com>, "sagis@...gle.com" <sagis@...gle.com>,
	"isaku.yamahata@...il.com" <isaku.yamahata@...il.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, Erdem Aktas
	<erdemaktas@...gle.com>, "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
	"pbonzini@...hat.com" <pbonzini@...hat.com>, Isaku Yamahata
	<isaku.yamahata@...el.com>, "isaku.yamahata@...ux.intel.com"
	<isaku.yamahata@...ux.intel.com>
Subject: Re: [PATCH v19 023/130] KVM: TDX: Initialize the TDX module when
 loading the KVM intel kernel module



On 19/04/2024 2:30 am, Sean Christopherson wrote:
> On Thu, Apr 18, 2024, Kai Huang wrote:
>> On 18/04/2024 11:35 am, Sean Christopherson wrote:
>>> Ah, yeah.  Oh, duh.  I think the reason I didn't initially suggest late_hardware_setup()
>>> is that I was assuming/hoping TDX setup could be done after kvm_x86_vendor_exit().
>>> E.g. in vt_init() or whatever it gets called:
>>>
>>> 	r = kvm_x86_vendor_exit(...);
>>> 	if (r)
>>> 		return r;
>>>
>>> 	if (enable_tdx) {
>>> 		r = tdx_blah_blah_blah();
>>> 		if (r)
>>> 			goto vendor_exit;
>>> 	}
>>
>>
>> I assume the reason you introduced the late_hardware_setup() is purely
>> because you want to do:
>>
>>    cpu_emergency_register_virt_callback(kvm_x86_ops.emergency_enable);
>>
>> after
>>
>>    kvm_ops_update()?
> 
> No, kvm_ops_update() needs to come before kvm_x86_enable_virtualization(), as the
> static_call() to hardware_enable() needs to be patched in.

Right.  I was talking about that the reason you introduced the 
late_hardware_setup() was because we need to do 
kvm_x86_virtualization_enabled() and the above 
cpu_emergency_register_virt_callback() after kvm_ops_update().

> 
> Oh, and my adjust patch is broken, the code to do the compat checks should NOT
> be removed; it could be removed if KVM unconditionally enabled VMX during setup,
> but it needs to stay in the !TDX case.

Right.

> 
> -       for_each_online_cpu(cpu) {
> -               smp_call_function_single(cpu, kvm_x86_check_cpu_compat, &r, 1);
> -               if (r < 0)
> -                       goto out_unwind_ops;
> -       }
> 
> Which is another reason to defer kvm_x86_enable_virtualization(), though to be
> honest not a particularly compelling reason on its own.
> 
>> Anyway, we can also do 'enable_tdx' outside of kvm_x86_vendor_init() as
>> above, given it cannot be done in hardware_setup() anyway.
>>
>> If we do 'enable_tdx' in late_hardware_setup(), we will need a
>> kvm_x86_enable_virtualization_nolock(), but that's also not a problem to me.
>>
>> So which way do you prefer?
>>
>> Btw, with kvm_x86_virtualization_enable(), it seems the compatibility check
>> is lost, which I assume is OK?
> 
> Heh, and I obviously wasn't reading ahead :-)
> 
>> Btw2, currently tdx_enable() requires cpus_read_lock() must be called prior.
>> If we do unconditional tdx_cpu_enable() in vt_hardware_enable(), then with
>> your proposal IIUC there's no such requirement anymore, because no task will
>> be scheduled to the new CPU before it reaches CPUHP_AP_ACTIVE.
> 
> Correct.
> 
>> But now calling cpus_read_lock()/unlock() around tdx_enable() also acceptable
>> to me.
> 
> No, that will deadlock as cpuhp_setup_state() does cpus_read_lock().

Right, but it takes cpus_read_lock()/unlock() internally.  I was talking 
about:

	if (enable_tdx) {
		kvm_x86_virtualization_enable();

		/*
		 * Unfortunately currently tdx_enable() internally has
		 * lockdep_assert_cpus_held().
		 */
		cpus_read_lock();
		tdx_enable();
		cpus_read_unlock();
	}
	
> 
>>>>> +int kvm_enable_virtualization(void)
>>>>>     {
>>>>> +	int r;
>>>>> +
>>>>> +	r = cpuhp_setup_state(CPUHP_AP_KVM_ONLINE, "kvm/cpu:online",
>>>>> +			      kvm_online_cpu, kvm_offline_cpu);
>>>>> +	if (r)
>>>>> +		return r;
>>>>> +
>>>>> +	register_syscore_ops(&kvm_syscore_ops);
>>>>> +
>>>>> +	/*
>>>>> +	 * Manually undo virtualization enabling if the system is going down.
>>>>> +	 * If userspace initiated a forced reboot, e.g. reboot -f, then it's
>>>>> +	 * possible for an in-flight module load to enable virtualization
>>>>> +	 * after syscore_shutdown() is called, i.e. without kvm_shutdown()
>>>>> +	 * being invoked.  Note, this relies on system_state being set _before_
>>>>> +	 * kvm_shutdown(), e.g. to ensure either kvm_shutdown() is invoked
>>>>> +	 * or this CPU observes the impedning shutdown.  Which is why KVM uses
>>>>> +	 * a syscore ops hook instead of registering a dedicated reboot
>>>>> +	 * notifier (the latter runs before system_state is updated).
>>>>> +	 */
>>>>> +	if (system_state == SYSTEM_HALT || system_state == SYSTEM_POWER_OFF ||
>>>>> +	    system_state == SYSTEM_RESTART) {
>>>>> +		unregister_syscore_ops(&kvm_syscore_ops);
>>>>> +		cpuhp_remove_state(CPUHP_AP_KVM_ONLINE);
>>>>> +		return -EBUSY;
>>>>> +	}
>>>>> +
>>>>
>>>> Aren't we also supposed to do:
>>>>
>>>> 	on_each_cpu(__kvm_enable_virtualization, NULL, 1);
>>>>
>>>> here?
>>>
>>> No, cpuhp_setup_state() invokes the callback, kvm_online_cpu(), on each CPU.
>>> I.e. KVM has been doing things the hard way by using cpuhp_setup_state_nocalls().
>>> That's part of the complexity I would like to get rid of.
>>
>> Ah, right :-)
>>
>> Btw, why couldn't we do the 'system_state' check at the very beginning of
>> this function?
> 
> We could, but we'd still need to check after, and adding a small bit of extra
> complexity just to try to catch a very rare situation isn't worth it.
> 
> To prevent races, system_state needs to be check after register_syscore_ops(),
> because only once kvm_syscore_ops is registered is KVM guaranteed to get notified
> of a shutdown. >
> And because the kvm_syscore_ops hooks disable virtualization, they should be called
> after cpuhp_setup_state().  That's not strictly required, as the per-CPU
> hardware_enabled flag will prevent true problems if the system enter shutdown
> state before KVM reaches cpuhp_setup_state().
> 
> Hmm, but the same edge cases exists in the above flow.  If the system enters
> shutdown _just_ after register_syscore_ops(), KVM would see that in system_state
> and do cpuhp_remove_state(), i.e. invoke kvm_offline_cpu() and thus do a double
> disable (which again is benign because of hardware_enabled).
> 
> Ah, but registering syscore ops before doing cpuhp_setup_state() has another race,
> and one that could be fatal.  If the system does suspend+resume before the cpuhup
> hooks are registered, kvm_resume() would enable virtualization.  And then if
> cpuhp_setup_state() failed, virtualization would be left enabled.
> 
> So cpuhp_setup_state() *must* come before register_syscore_ops(), and
> register_syscore_ops() *must* come before the system_state check.

OK.  I guess I have to double check here to completely understand the 
races.  :-)

So I think we have consensus to go with the approach that shows in your 
second diff -- that is to always enable virtualization during module 
loading for all other ARCHs other than x86, for which we only always 
enables virtualization during module loading for TDX.

Then how about "do kvm_x86_virtualization_enable()  within 
late_hardware_setup() in kvm_x86_vendor_init()"  vs "do 
kvm_x86_virtualization_enable() in TDX-specific code after 
kvm_x86_vendor_init()"?

Which do you prefer?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ