linux-kernel - Re: [PATCH] x86/hyperv: Suspend/resume the VP assist page for hibernation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200420120822.4bncj2iwgqbpoxei@debian>
Date:   Mon, 20 Apr 2020 13:08:22 +0100
From:   Wei Liu <wei.liu@...nel.org>
To:     Dexuan Cui <decui@...rosoft.com>
Cc:     Wei Liu <wei.liu@...nel.org>, "bp@...en8.de" <bp@...en8.de>,
        Haiyang Zhang <haiyangz@...rosoft.com>,
        "hpa@...or.com" <hpa@...or.com>, KY Srinivasan <kys@...rosoft.com>,
        "linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "mingo@...hat.com" <mingo@...hat.com>,
        Stephen Hemminger <sthemmin@...rosoft.com>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "x86@...nel.org" <x86@...nel.org>,
        Michael Kelley <mikelley@...rosoft.com>,
        vkuznets <vkuznets@...hat.com>
Subject: Re: [PATCH] x86/hyperv: Suspend/resume the VP assist page for
 hibernation

On Fri, Apr 17, 2020 at 11:47:41PM +0000, Dexuan Cui wrote:
> > From: Wei Liu <wei.liu@...nel.org>
> > Sent: Friday, April 17, 2020 4:00 AM
> > To: Dexuan Cui <decui@...rosoft.com>
> > 
> > On Thu, Apr 16, 2020 at 11:29:59PM -0700, Dexuan Cui wrote:
> > > Unlike the other CPUs, CPU0 is never offlined during hibernation. So in the
> > > resume path, the "new" kernel's VP assist page is not suspended (i.e.
> > > disabled), and later when we jump to the "old" kernel, the page is not
> > > properly re-enabled for CPU0 with the allocated page from the old kernel.
> > >
> > > So far, the VP assist page is only used by hv_apic_eoi_write(). When the
> > > page is not properly re-enabled, hvp->apic_assist is always 0, so the
> > > HV_X64_MSR_EOI MSR is always written. This is not ideal with respect to
> > > performance, but Hyper-V can still correctly handle this.
> > >
> > > The issue is: the hypervisor can corrupt the old kernel memory, and hence
> > > sometimes cause unexpected behaviors, e.g. when the old kernel's non-boot
> > > CPUs are being onlined in the resume path, the VM can hang or be killed
> > > due to virtual triple fault.
> > 
> > I don't quite follow here.
> > 
> > The first sentence is rather alarming -- why would Hyper-V corrupt
> > guest's memory (kernel or not)?
> 
> Without this patch, after the VM resumes from hibernation, the hypervisor 
> still thinks the assist page of vCPU0 points to the physical page allocated by
> the "new" kernel (the "new" kernel started up freshly, loaded the saved state 
> of the "old" kernel from disk into memory, and jumped to the "old" kernel),
> but the same physical page can be allocated to store something different in
> the "old" kernel (which is the currently running kernel, since the VM resumed).
> 
> Conceptually, it looks Hyper-V writes into the assist page from time to time,
> e.g. for the EOI optimization. This "corrupts" the page for the "old" kernel.
> 
> I'm not absolutely sure if this explains the strange hang issue or triple fault
> I occasionally saw in my long-haul hibernation test, but with this patch,
> I never reproduce the strange hang/triple fault issue again, so I think this
> patch works.

OK. I wouldn't be surprised if the corruption ends up changing code in
the kernel which further triggers triple fault etc. 

I would suggest make this clear in the commit message to not give the
impression that Hyper-V has this weird behaviour of corrupting guest
memory for no reason.

We can replace the paragraph starting with "The issue is: ..." with:

---
Linux needs to update Hyper-V the correct VP assist page to prevent
Hyper-V from writing to a stale page, which causes guest memory
corruption.  The memory corruption may have caused some of the hangs and
triple faults we saw during non-boot CPUs resume.
---

This What do you think?

Wei.

> 
> > Secondly, code below only specifies cpu0. What does it do with non-boot
> > cpus on the resume path?
> > 
> > Wei.
> 
> hyperv_init() registers hv_cpu_init()/hv_cpu_die() to the cpuhp framework:
> 
> cpuhp = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/hyperv_init:online",
>                        hv_cpu_init, hv_cpu_die);
> 
> In the hibernation procedure, the non-boot CPUs are automatically disabled
> and reenabled, so hv_cpu_init()/hv_cpu_die() are automatically called for them,
> e.g. in the resume path, see:
>     hibernation_restore()
>         resume_target_kernel()
>             hibernate_resume_nonboot_cpu_disable()
>                 disable_nonboot_cpus() 
>             syscore_suspend()
>                 hv_cpu_die(0)  // Added by this patch
>             swsusp_arch_resume()
>                 relocate_restore_code()
>                     restore_image()
>                         jump to the old kernel and we return from 
>                         the swsusp_arch_suspend() in create_image()
>                             syscore_resume()
>                                 hv_cpu_init(0) // Added by this patch.
>                             suspend_enable_secondary_cpus()
>                             dpm_resume_start()
>                             ...
> Thanks,
> -- Dexuan