lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200417110007.uzfo6musx2x2suw7@debian>
Date:   Fri, 17 Apr 2020 12:00:07 +0100
From:   Wei Liu <wei.liu@...nel.org>
To:     Dexuan Cui <decui@...rosoft.com>
Cc:     bp@...en8.de, haiyangz@...rosoft.com, hpa@...or.com,
        kys@...rosoft.com, linux-hyperv@...r.kernel.org,
        linux-kernel@...r.kernel.org, mingo@...hat.com,
        sthemmin@...rosoft.com, tglx@...utronix.de, x86@...nel.org,
        mikelley@...rosoft.com, vkuznets@...hat.com, wei.liu@...nel.org
Subject: Re: [PATCH] x86/hyperv: Suspend/resume the VP assist page for
 hibernation

On Thu, Apr 16, 2020 at 11:29:59PM -0700, Dexuan Cui wrote:
> Unlike the other CPUs, CPU0 is never offlined during hibernation. So in the
> resume path, the "new" kernel's VP assist page is not suspended (i.e.
> disabled), and later when we jump to the "old" kernel, the page is not
> properly re-enabled for CPU0 with the allocated page from the old kernel.
> 
> So far, the VP assist page is only used by hv_apic_eoi_write(). When the
> page is not properly re-enabled, hvp->apic_assist is always 0, so the
> HV_X64_MSR_EOI MSR is always written. This is not ideal with respect to
> performance, but Hyper-V can still correctly handle this.
> 
> The issue is: the hypervisor can corrupt the old kernel memory, and hence
> sometimes cause unexpected behaviors, e.g. when the old kernel's non-boot
> CPUs are being onlined in the resume path, the VM can hang or be killed
> due to virtual triple fault.

I don't quite follow here.

The first sentence is rather alarming -- why would Hyper-V corrupt
guest's memory (kernel or not)?

Secondly, code below only specifies cpu0. What does it do with non-boot
cpus on the resume path?

Wei.

> 
> Fix the issue by calling hv_cpu_die()/hv_cpu_init() in the syscore ops.
> 
> Without the fix, hibernation can fail at a rate of 1/300 ~ 1/500.
> With the fix, hibernation can pass a long-haul test of 2000 rounds.
> 
> Fixes: 05bd330a7fd8 ("x86/hyperv: Suspend/resume the hypercall page for hibernation")
> Cc: stable@...r.kernel.org
> Signed-off-by: Dexuan Cui <decui@...rosoft.com>
> ---
>  arch/x86/hyperv/hv_init.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index b0da5320bcff..4d3ce86331a3 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -72,7 +72,8 @@ static int hv_cpu_init(unsigned int cpu)
>  	struct page *pg;
>  
>  	input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
> -	pg = alloc_page(GFP_KERNEL);
> +	/* hv_cpu_init() can be called with IRQs disabled from hv_resume() */
> +	pg = alloc_page(GFP_ATOMIC);
>  	if (unlikely(!pg))
>  		return -ENOMEM;
>  	*input_arg = page_address(pg);
> @@ -253,6 +254,7 @@ static int __init hv_pci_init(void)
>  static int hv_suspend(void)
>  {
>  	union hv_x64_msr_hypercall_contents hypercall_msr;
> +	int ret;
>  
>  	/*
>  	 * Reset the hypercall page as it is going to be invalidated
> @@ -269,12 +271,17 @@ static int hv_suspend(void)
>  	hypercall_msr.enable = 0;
>  	wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
>  
> -	return 0;
> +	ret = hv_cpu_die(0);
> +	return ret;
>  }
>  
>  static void hv_resume(void)
>  {
>  	union hv_x64_msr_hypercall_contents hypercall_msr;
> +	int ret;
> +
> +	ret = hv_cpu_init(0);
> +	WARN_ON(ret);
>  
>  	/* Re-enable the hypercall page */
>  	rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
> @@ -287,6 +294,7 @@ static void hv_resume(void)
>  	hv_hypercall_pg_saved = NULL;
>  }
>  
> +/* Note: when the ops are called, only CPU0 is online and IRQs are disabled. */
>  static struct syscore_ops hv_syscore_ops = {
>  	.suspend	= hv_suspend,
>  	.resume		= hv_resume,
> -- 
> 2.19.1
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ