lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 11 Aug 2022 11:16:02 +0000
From:   "Huang, Kai" <kai.huang@...el.com>
To:     "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "Yamahata, Isaku" <isaku.yamahata@...el.com>
CC:     "pbonzini@...hat.com" <pbonzini@...hat.com>,
        "Shahar, Sagi" <sagis@...gle.com>,
        "Aktas, Erdem" <erdemaktas@...gle.com>,
        "isaku.yamahata@...il.com" <isaku.yamahata@...il.com>,
        "Christopherson,, Sean" <seanjc@...gle.com>
Subject: Re: [PATCH v8 003/103] KVM: Refactor CPU compatibility check on
 module initialization

First of all, I think the patch title can be improved.  "refactor CPU
compatibility check on module initialization" isn't the purpose of this patch. 
It is just a bonus.  The title should reflect the main purpose (or behaviour) of
this patch:

	KVM: Temporarily enable hardware on all cpus during module loading time

On Sun, 2022-08-07 at 15:00 -0700, isaku.yamahata@...el.com wrote:
> From: Isaku Yamahata <isaku.yamahata@...el.com>
> 
> TDX module requires its initialization.  It requires VMX to be enabled.
> Although there are several options of when to initialize it, the choice is
> the initialization time of the KVM kernel module.  There is no usable
> arch-specific hook for the TDX module to utilize during the KVM kernel module
> initialization.  The code doesn't enable/disable hardware (VMX in TDX case)
> during the kernel module initialization.  Add a hook for enabling hardware,
> arch-specific initialization, and disabling hardware during KVM kernel
> module initialization to make a room for TDX module initialization.  
> 

The hook is only for arch-specific initialization.  It doesn't do
hardware_enable_all() and hardware_disable_all().

> The
> current KVM enables hardware when the first VM is created and disables
> hardware when the last VM is destroyed.  When no VM is running, hardware is
> disabled.  To follow these semantics, the kernel module initialization needs
> to disable hardware. Opportunistically refactor the code to enable/disable
> hardware.
> 
> Add hadware_enable_all() and hardware_disable_all() to kvm_init() and
> introduce a new arch-specific callback function,
> kvm_arch_post_hardware_enable_setup, for arch to do arch-specific
> initialization that requires hardware_enable_all().  Opportunistically,
> move kvm_arch_check_processor_compat() to to hardware_enabled_nolock().
> TDX module initialization code will go into
> kvm_arch_post_hardware_enable_setup().

The remaining paragraphs are very hard to follow.  IMHO they are too details and
are not changelog material.  If you can obtain Reviewed-by from
reviewer/maintainer for specific arch, then it implies the code is fine to that
arch, and you don't need those details below in changelog.

So perhaps just say something like:

With doing hardware_enable_all() and hardware_disable_all() in kvm_init(), some
arch-specific code which must be done before them must also be moved before
kvm_init().  Only VMX and MIPS have such code:
- For VMX, per-cpu variable loaded_vmcss_on_cpu must be initialized before
hardware_disable_all().  Introduce vmx_early_init() to do it and call
vmx_eraly_init() before kvm_init() in vmx_init().
- For MIPS, move kvm_init() to the end of kvm_mips_init() (or whatever you
like...).


> 
> This patch reorders some function calls as below from (*) (**) (A) and (B)
> to (A) (B) and (*).  Here (A) and (B) depends on (*), but not (**).  By
> code inspection, only mips and VMX has the code of (*).  No other
> arch has empty (*).  So refactor mips and VMX and eliminate the
> necessity hook for (*) instead of adding an unused hook.
> 
> Before this patch:
> - Arch module initialization
>   - kvm_init()
>     - kvm_arch_init()
>     - kvm_arch_check_processor_compat() on each CPUs
>   - post-arch-specific initialization -- (*): (A) and (B) depends on this
>   - post-arch-specific initialization -- (**): no dependency to (A) and (B)
> 
> - When creating/deleting the first/last VM
>    - kvm_arch_hardware_enable() on each CPUs -- (A)
>    - kvm_arch_hardware_disable() on each CPUs -- (B)
> 
> After this patch:
> - Arch module initialization
>   - kvm_init()
>     - kvm_arch_init()
>     - arch-specific initialization -- (*)
>     - kvm_arch_check_processor_compat() on each CPUs
>     - kvm_arch_hardware_enable() on each CPUs -- (A)
>     - kvm_arch_hardware_disable() on each CPUs -- (B)
>   - post-arch-specific initialization  -- (**)
> 
> - When creating/deleting the first/last VM (no logic change)
>    - kvm_arch_hardware_enable() on each CPUs -- (A)
>    - kvm_arch_hardware_disable() on each CPUs -- (B)
> 
> Code inspection result:
> As long as I inspected, I found only mips and VMX have non-empty (*) or
> non-empty (A) or (B).
> x86: tested on a real machine
> mips: compile test only
> powerpc, s390, arm, riscv: code inspection only
> 
> - arch/mips/kvm/mips.c
>   module init function, kvm_mips_init(), does some initialization after
>   kvm_init().  Compile test only.
> 
> - arch/x86/kvm/x86.c
>   - uses vm_list which is statically initialized.
>   - static_call(kvm_x86_hardware_enable)();
>     - SVM: (*) and (**) are empty.
>     - VMX: initialize percpu variable loaded_vmcss_on_cpu that VMXON uses.
> 
> - arch/powerpc/kvm/powerpc.c
>   kvm_arch_hardware_enable/disable() are nop
> 
> - arch/s390/kvm/kvm-s390.c
>   kvm_arch_hardware_enable/disable() are nop
> 
> - arch/arm64/kvm/arm.c
>   module init function, arm_init(), calls only kvm_init().
>   (*) and (**) are empty
> 
> - arch/riscv/kvm/main.c
>   module init function, riscv_kvm_init(), calls only kvm_init().
>   (*) and (**) are empty
> 
> Co-developed-by: Sean Christopherson <seanjc@...gle.com>
> Signed-off-by: Sean Christopherson <seanjc@...gle.com>
> Signed-off-by: Isaku Yamahata <isaku.yamahata@...el.com>
> ---
>  arch/mips/kvm/mips.c     | 16 +++++++++++-----
>  arch/x86/kvm/vmx/vmx.c   | 23 +++++++++++++++++++----
>  include/linux/kvm_host.h |  1 +
>  virt/kvm/kvm_main.c      | 31 ++++++++++++++++++++++++-------
>  4 files changed, 55 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
> index 092d09fb6a7e..fd7339cff57c 100644
> --- a/arch/mips/kvm/mips.c
> +++ b/arch/mips/kvm/mips.c
> @@ -1642,12 +1642,11 @@ static int __init kvm_mips_init(void)
>  		return -EOPNOTSUPP;
>  	}
>  
> +	/*
> +	 * kvm_init() calls kvm_arch_hardware_enable/disable().  The early
> +	 * initialization is needed before calling kvm_init().
> +	 */

Sorry I take this back.  It's upto MIPS reviewer/maintainer to decide whether we
need a comment here.

>  	ret = kvm_mips_entry_setup();
> -	if (ret)
> -		return ret;
> -
> -	ret = kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE);
> -
>  	if (ret)
>  		return ret;
>  
> @@ -1656,6 +1655,13 @@ static int __init kvm_mips_init(void)
>  
>  	register_die_notifier(&kvm_mips_csr_die_notifier);
>  
> +	ret = kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE);
> +
> +	if (ret) {
> +		unregister_die_notifier(&kvm_mips_csr_die_notifier);
> +		return ret;
> +	}
> +
>  	return 0;
>  }
>  
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index c5637a4c1c43..4d7e1073984d 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -8433,6 +8433,23 @@ static void vmx_exit(void)
>  }
>  module_exit(vmx_exit);
>  
> +/*
> + * Early initialization before kvm_init() so that vmx_hardware_enable/disable()
> + * can work.
> + */
> +static void __init vmx_init_early(void)
> +{
> +	int cpu;
> +
> +	/*
> +	 * vmx_hardware_disable() accesses loaded_vmcss_on_cpu list.
> +	 * Initialize the variable before kvm_init() that calls
> +	 * vmx_hardware_enable/disable().
> +	 */
> +	for_each_possible_cpu(cpu)
> +		INIT_LIST_HEAD(&per_cpu(loaded_vmcss_on_cpu, cpu));
> +}
> +
>  static int __init vmx_init(void)
>  {
>  	int r, cpu;
> @@ -8470,6 +8487,7 @@ static int __init vmx_init(void)
>  	}
>  #endif
>  
> +	vmx_init_early();
>  	r = kvm_init(&vmx_init_ops, sizeof(struct vcpu_vmx),
>  		     __alignof__(struct vcpu_vmx), THIS_MODULE);
>  	if (r)
> @@ -8490,11 +8508,8 @@ static int __init vmx_init(void)
>  
>  	vmx_setup_fb_clear_ctrl();
>  
> -	for_each_possible_cpu(cpu) {
> -		INIT_LIST_HEAD(&per_cpu(loaded_vmcss_on_cpu, cpu));
> -
> +	for_each_possible_cpu(cpu)
>  		pi_init_cpu(cpu);
> -	}
>  
>  #ifdef CONFIG_KEXEC_CORE
>  	rcu_assign_pointer(crash_vmclear_loaded_vmcss,
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 9643b8eadefe..63d4e3ce3e53 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1437,6 +1437,7 @@ static inline void kvm_create_vcpu_debugfs(struct kvm_vcpu *vcpu) {}
>  int kvm_arch_hardware_enable(void);
>  void kvm_arch_hardware_disable(void);
>  int kvm_arch_hardware_setup(void *opaque);
> +int kvm_arch_post_hardware_enable_setup(void *opaque);
>  void kvm_arch_hardware_unsetup(void);
>  int kvm_arch_check_processor_compat(void);
>  int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu);
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 0f5767e5ae45..a43fdbfaede2 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -4995,8 +4995,13 @@ static void hardware_enable_nolock(void *junk)
>  
>  	cpumask_set_cpu(cpu, cpus_hardware_enabled);
>  
> +	r = kvm_arch_check_processor_compat();
> +	if (r)
> +		goto out;
> +
>  	r = kvm_arch_hardware_enable();
>  
> +out:
>  	if (r) {
>  		cpumask_clear_cpu(cpu, cpus_hardware_enabled);
>  		atomic_inc(&hardware_enable_failed);
> @@ -5793,9 +5798,9 @@ void kvm_unregister_perf_callbacks(void)
>  }
>  #endif
>  
> -static void check_processor_compat(void *rtn)
> +__weak int kvm_arch_post_hardware_enable_setup(void *opaque)
>  {
> -	*(int *)rtn = kvm_arch_check_processor_compat();
> +	return 0;
>  }
>  
>  int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
> @@ -5828,11 +5833,23 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align,
>  	if (r < 0)
>  		goto out_free_1;
>  
> -	for_each_online_cpu(cpu) {
> -		smp_call_function_single(cpu, check_processor_compat, &r, 1);
> -		if (r < 0)
> -			goto out_free_2;
> -	}
> +	/* hardware_enable_nolock() checks CPU compatibility on each CPUs. */
> +	r = hardware_enable_all();
> +	if (r)
> +		goto out_free_2;
> +	/*
> +	 * Arch specific initialization that requires to enable virtualization
> +	 * feature.  e.g. TDX module initialization requires VMXON on all
> +	 * present CPUs.
> +	 */
> +	kvm_arch_post_hardware_enable_setup(opaque);
> +	/*
> +	 * Make hardware disabled after the KVM module initialization.  KVM
> +	 * enables hardware when the first KVM VM is created and disables
> +	 * hardware when the last KVM VM is destroyed.  When no KVM VM is
> +	 * running, hardware is disabled.  Keep that semantics.
> +	 */

Except the first sentence, the remaining sentences are more like changelog
material.  Perhaps just say something below to be more specific on the purpose:

	/*
	 * Disable hardware on all cpus so that out-of-tree drivers which
	 * also use hardware-assisted virtualization (such as virtualbox
	 * kernel module) can still be loaded when KVM is loaded.
	 */

> +	hardware_disable_all();
>  
>  	r = cpuhp_setup_state_nocalls(CPUHP_AP_KVM_STARTING, "kvm/cpu:starting",
>  				      kvm_starting_cpu, kvm_dying_cpu);


Anyway (although I still think it's better to introduce the __weak
kvm_arch_post_hardware_enable_setup() in later patch together with the VMX
version to support TDX), with above comments addressed:

Acked-by: Kai Huang <kai.huang@...el.com>

-- 
Thanks,
-Kai


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ