[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.21.1907180058210.1778@nanos.tec.linutronix.de>
Date: Thu, 18 Jul 2019 01:03:37 +0200 (CEST)
From: Thomas Gleixner <tglx@...utronix.de>
To: Dexuan Cui <decui@...rosoft.com>
cc: Haiyang Zhang <haiyangz@...rosoft.com>,
KY Srinivasan <kys@...rosoft.com>,
Stephen Hemminger <sthemmin@...rosoft.com>,
Sasha Levin <Alexander.Levin@...rosoft.com>,
"linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
Michael Kelley <mikelley@...rosoft.com>,
Long Li <longli@...rosoft.com>, vkuznets <vkuznets@...hat.com>,
Ingo Molnar <mingo@...nel.org>,
"H. Peter Anvin" <hpa@...or.com>, Borislav Petkov <bp@...en8.de>,
"x86@...nel.org" <x86@...nel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"marcelo.cerri@...onical.com" <marcelo.cerri@...onical.com>,
"driverdev-devel@...uxdriverproject.org"
<driverdev-devel@...uxdriverproject.org>,
"olaf@...fle.de" <olaf@...fle.de>,
"apw@...onical.com" <apw@...onical.com>,
"jasowang@...hat.com" <jasowang@...hat.com>
Subject: Re: [PATCH] x86/hyper-v: Zero out the VP assist page to fix CPU
offlining
Dexuan,
On Thu, 4 Jul 2019, Dexuan Cui wrote:
> When a CPU is being offlined, the CPU usually still receives a few
> interrupts (e.g. reschedule IPIs), after hv_cpu_die() disables the
> HV_X64_MSR_VP_ASSIST_PAGE, so hv_apic_eoi_write() may not write the EOI
> MSR, if the apic_assist field's bit0 happens to be 1; as a result, Hyper-V
> may not be able to deliver all the interrupts to the CPU, and the CPU may
> not be stopped, and the kernel will hang soon.
>
> The VP ASSIST PAGE is an "overlay" page (see Hyper-V TLFS's Section
> 5.2.1 "GPA Overlay Pages"), so with this fix we're sure the apic_assist
> field is still zero, after the VP ASSIST PAGE is disabled.
>
> Fixes: ba696429d290 ("x86/hyper-v: Implement EOI assist")
> Signed-off-by: Dexuan Cui <decui@...rosoft.com>
> ---
> arch/x86/hyperv/hv_init.c | 8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 0e033ef11a9f..db51a301f759 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -60,8 +60,14 @@ static int hv_cpu_init(unsigned int cpu)
> if (!hv_vp_assist_page)
> return 0;
>
> + /*
> + * The ZERO flag is necessary, because in the case of CPU offlining
> + * the page can still be used by hv_apic_eoi_write() for a while,
> + * after the VP ASSIST PAGE is disabled in hv_cpu_die().
> + */
> if (!*hvp)
> - *hvp = __vmalloc(PAGE_SIZE, GFP_KERNEL, PAGE_KERNEL);
> + *hvp = __vmalloc(PAGE_SIZE, GFP_KERNEL | __GFP_ZERO,
> + PAGE_KERNEL);
This is the allocation when the CPU is brought online for the first
time. So what effect has zeroing at allocation time vs. offlining and
potentially receiving IPIs? That allocation is never freed.
Neither the comment nor the changelog make any sense to me.
Thanks,
tglx
Powered by blists - more mailing lists