[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <14b0bc83-f645-408f-b8af-13f49fe6155d@amd.com>
Date: Wed, 4 Sep 2024 12:37:17 -0500
From: "Kalra, Ashish" <ashish.kalra@....com>
To: Paolo Bonzini <pbonzini@...hat.com>,
 Sean Christopherson <seanjc@...gle.com>
Cc: dave.hansen@...ux.intel.com, tglx@...utronix.de, mingo@...hat.com,
 bp@...en8.de, x86@...nel.org, hpa@...or.com, peterz@...radead.org,
 linux-kernel@...r.kernel.org, kvm@...r.kernel.org, thomas.lendacky@....com,
 michael.roth@....com, kexec@...ts.infradead.org, linux-coco@...ts.linux.dev
Subject: Re: [PATCH v2] x86/sev: Fix host kdump support for SNP
Hello Paolo,
On 9/4/2024 5:29 AM, Paolo Bonzini wrote:
> On 9/4/24 00:58, Kalra, Ashish wrote:
>> The issue here is that panic path will ensure that all (other) CPUs
>> have been shutdown via NMI by checking that they have executed the
>> NMI shutdown callback.
>>
>> But the above synchronization is specifically required for SNP case,
>> as we don't want to execute the SNP_DECOMMISSION command (to destroy
>> SNP guest context) while one or more CPUs are still in the NMI VMEXIT
>> path and still in the process of saving the vCPU state (and still
>> modifying SNP guest context?) during this VMEXIT path. Therefore, we
>> ensure that all the CPUs have saved the vCPU state and entered NMI
>> context before issuing SNP_DECOMMISSION. The point is that this is a
>> specific SNP requirement (and that's why this specific handling in
>> sev_emergency_disable()) and i don't know how we will be able to
>> enforce it in the generic panic path ?
>
> I think a simple way to do this is to _first_ kick out other
> CPUs through NMI, and then the one that is executing
> emergency_reboot_disable_virtualization().  This also makes
> emergency_reboot_disable_virtualization() and
> native_machine_crash_shutdown() more similar, in that
> the latter already stops other CPUs before disabling
> virtualization on the one that orchestrates the shutdown.
>
> Something like (incomplete, it has to also add the bool argument
> to cpu_emergency_virt_callback and the actual callbacks):
>
> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> index 340af8155658..3df25fbe969d 100644
> --- a/arch/x86/kernel/crash.c
> +++ b/arch/x86/kernel/crash.c
> @@ -111,7 +111,7 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
>  
>      crash_smp_send_stop();
>  
> -    cpu_emergency_disable_virtualization();
> +    cpu_emergency_disable_virtualization(true);
>  
>      /*
>       * Disable Intel PT to stop its logging
> diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
> index 0e0a4cf6b5eb..7a86ec786987 100644
> --- a/arch/x86/kernel/reboot.c
> +++ b/arch/x86/kernel/reboot.c
> @@ -558,7 +558,7 @@ EXPORT_SYMBOL_GPL(cpu_emergency_unregister_virt_callback);
>   * reboot.  VMX blocks INIT if the CPU is post-VMXON, and SVM blocks INIT if
>   * GIF=0, i.e. if the crash occurred between CLGI and STGI.
>   */
> -void cpu_emergency_disable_virtualization(void)
> +void cpu_emergency_disable_virtualization(bool last)
>  {
>      cpu_emergency_virt_cb *callback;
>  
> @@ -572,7 +572,7 @@ void cpu_emergency_disable_virtualization(void)
>      rcu_read_lock();
>      callback = rcu_dereference(cpu_emergency_virt_callback);
>      if (callback)
> -        callback();
> +        callback(last);
>      rcu_read_unlock();
>  }
>  
> @@ -591,11 +591,11 @@ static void emergency_reboot_disable_virtualization(void)
>       * other CPUs may have virtualization enabled.
>       */
>      if (rcu_access_pointer(cpu_emergency_virt_callback)) {
> -        /* Safely force _this_ CPU out of VMX/SVM operation. */
> -        cpu_emergency_disable_virtualization();
> -
>          /* Disable VMX/SVM and halt on other CPUs. */
>          nmi_shootdown_cpus_on_restart();
> +
> +        /* Safely force _this_ CPU out of VMX/SVM operation. */
> +        cpu_emergency_disable_virtualization(true);
>      }
>  }
>  #else
> @@ -877,7 +877,7 @@ static int crash_nmi_callback(unsigned int val, struct pt_regs *regs)
>       * Prepare the CPU for reboot _after_ invoking the callback so that the
>       * callback can safely use virtualization instructions, e.g. VMCLEAR.
>       */
> -    cpu_emergency_disable_virtualization();
> +    cpu_emergency_disable_virtualization(false);
>  
>      atomic_dec(&waiting_for_crash_ipi);
>  
> diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
> index 18266cc3d98c..9a863348d1a7 100644
> --- a/arch/x86/kernel/smp.c
> +++ b/arch/x86/kernel/smp.c
> @@ -124,7 +124,7 @@ static int smp_stop_nmi_callback(unsigned int val, struct pt_regs *regs)
>      if (raw_smp_processor_id() == atomic_read(&stopping_cpu))
>          return NMI_HANDLED;
>  
> -    cpu_emergency_disable_virtualization();
> +    cpu_emergency_disable_virtualization(false);
>      stop_this_cpu(NULL);
>  
>      return NMI_HANDLED;
> @@ -136,7 +136,7 @@ static int smp_stop_nmi_callback(unsigned int val, struct pt_regs *regs)
>  DEFINE_IDTENTRY_SYSVEC(sysvec_reboot)
>  {
>      apic_eoi();
> -    cpu_emergency_disable_virtualization();
> +    cpu_emergency_disable_virtualization(false);
>      stop_this_cpu(NULL);
>  }
>  
>
> And then a second patch adds sev_emergency_disable() and only
> executes it if last == true.
>
This implementation will not work as we need to do wbinvd on all other CPUs after SNP_DECOMMISSION has been issued.
When the last CPU executes sev_emergency_disable() and issues SNP_DECOMMISSION, by that time all other CPUs have entered the NMI halt loop and then they won't be able to do a wbinvd and hence SNP_SHUTDOWN will eventually fail.
One way this can work is if all other CPUs can still execute sev_emergency_disable() and in case of last == false, do a spin busy till the last cpu kicks them out of the spin loop and then they do a wbinvd after exiting the spin busy, but then cpu_emergency_disable_virtualization() is/has to be called before atomic_dec(&waiting_for_crash_ipi) in crash_nmi_callback(), so this spin busy in other CPUs will force the last CPU to wait forever (or till it times out waiting for all other CPUs to execute the NMI callback) which makes all of this looks quite fragile.
Thanks, Ashish
Powered by blists - more mailing lists
 
