[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BYAPR21MB1688482206965765C716DC9ED73B9@BYAPR21MB1688.namprd21.prod.outlook.com>
Date: Fri, 4 Nov 2022 16:23:08 +0000
From: "Michael Kelley (LINUX)" <mikelley@...rosoft.com>
To: Vitaly Kuznetsov <vkuznets@...hat.com>,
"linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>
CC: KY Srinivasan <kys@...rosoft.com>,
Haiyang Zhang <haiyangz@...rosoft.com>,
Wei Liu <wei.liu@...nel.org>, Dexuan Cui <decui@...rosoft.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"x86@...nel.org" <x86@...nel.org>
Subject: RE: [PATCH] x86/hyperv: Restore VP assist page after cpu
offlining/onlining
From: Vitaly Kuznetsov <vkuznets@...hat.com> Sent: Thursday, November 3, 2022 12:06 PM
>
> Commit e5d9b714fe40 ("x86/hyperv: fix root partition faults when writing
> to VP assist page MSR") moved 'wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE)' under
> 'if (*hvp)' condition. This works for root partition as hv_cpu_die()
> does memunmap() and sets 'hv_vp_assist_page[cpu]' to NULL but breaks
> non-root partitions as hv_cpu_die() doesn't free 'hv_vp_assist_page[cpu]'
> for them.
Did you consider having hv_cpu_die() free the VP assist page and
set hv_vp_assist_page[cpu] to NULL in the non-root case? That would
make the root and non-root cases more consistent, and it would make
hv_cpu_init() and hv_cpu_die() more symmetrical. The hv_cpu_die()
path frees up pretty much all the other per-CPU resources. I don't
know why it keeps the VP assist page for re-use if the CPU comes back
online later.
You added the original code for allocating the vp_assist_page in
commit a46d15cc1a, so maybe you remember if there are any
gotchas. :-)
Michael
> This causes VP assist page to remain unset after CPU
> offline/online cycle:
>
> $ rdmsr -p 24 0x40000073
> 10212f001
> $ echo 0 > /sys/devices/system/cpu/cpu24/online
> $ echo 1 > /sys/devices/system/cpu/cpu24/online
> $ rdmsr -p 24 0x40000073
> 0
>
> Fix the issue by always writing to HV_X64_MSR_VP_ASSIST_PAGE in
> hv_cpu_init(). Note, checking 'if (!*hvp)', for root partition is
> pointless as hv_cpu_die() always sets 'hv_vp_assist_page[cpu]' to
> NULL (and it's also NULL initially).
>
> Note: the fact that 'hv_vp_assist_page[cpu]' is reset to NULL may
> present a (potential) issue for KVM. While Hyper-V uses
> CPUHP_AP_ONLINE_DYN stage in CPU hotplug, KVM uses
> CPUHP_AP_KVM_STARTING
> which comes earlier in CPU teardown sequence. It is theoretically
> possible that Enlightened VMCS is still in use. It is unclear if the
> issue is real and if using KVM with Hyper-V root partition is even
> possible.
>
> While on it, drop the unneeded smp_processor_id() call from hv_cpu_init().
>
> Fixes: e5d9b714fe40 ("x86/hyperv: fix root partition faults when writing to VP assist
> page MSR")
> Signed-off-by: Vitaly Kuznetsov <vkuznets@...hat.com>
> ---
> arch/x86/hyperv/hv_init.c | 54 +++++++++++++++++++--------------------
> 1 file changed, 26 insertions(+), 28 deletions(-)
>
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index f49bc3ec76e6..a269049a43ce 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -77,7 +77,7 @@ static int hyperv_init_ghcb(void)
> static int hv_cpu_init(unsigned int cpu)
> {
> union hv_vp_assist_msr_contents msr = { 0 };
> - struct hv_vp_assist_page **hvp = &hv_vp_assist_page[smp_processor_id()];
> + struct hv_vp_assist_page **hvp = &hv_vp_assist_page[cpu];
> int ret;
>
> ret = hv_common_cpu_init(cpu);
> @@ -87,34 +87,32 @@ static int hv_cpu_init(unsigned int cpu)
> if (!hv_vp_assist_page)
> return 0;
>
> - if (!*hvp) {
> - if (hv_root_partition) {
> - /*
> - * For root partition we get the hypervisor provided VP assist
> - * page, instead of allocating a new page.
> - */
> - rdmsrl(HV_X64_MSR_VP_ASSIST_PAGE, msr.as_uint64);
> - *hvp = memremap(msr.pfn <<
> -
> HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_SHIFT,
> - PAGE_SIZE, MEMREMAP_WB);
> - } else {
> - /*
> - * The VP assist page is an "overlay" page (see Hyper-V TLFS's
> - * Section 5.2.1 "GPA Overlay Pages"). Here it must be zeroed
> - * out to make sure we always write the EOI MSR in
> - * hv_apic_eoi_write() *after* the EOI optimization is disabled
> - * in hv_cpu_die(), otherwise a CPU may not be stopped in the
> - * case of CPU offlining and the VM will hang.
> - */
> + if (hv_root_partition) {
> + /*
> + * For root partition we get the hypervisor provided VP assist
> + * page, instead of allocating a new page.
> + */
> + rdmsrl(HV_X64_MSR_VP_ASSIST_PAGE, msr.as_uint64);
> + *hvp = memremap(msr.pfn <<
> HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_SHIFT,
> + PAGE_SIZE, MEMREMAP_WB);
> + } else {
> + /*
> + * The VP assist page is an "overlay" page (see Hyper-V TLFS's
> + * Section 5.2.1 "GPA Overlay Pages"). Here it must be zeroed
> + * out to make sure we always write the EOI MSR in
> + * hv_apic_eoi_write() *after* the EOI optimization is disabled
> + * in hv_cpu_die(), otherwise a CPU may not be stopped in the
> + * case of CPU offlining and the VM will hang.
> + */
> + if (!*hvp)
> *hvp = __vmalloc(PAGE_SIZE, GFP_KERNEL | __GFP_ZERO);
> - if (*hvp)
> - msr.pfn = vmalloc_to_pfn(*hvp);
> - }
> - WARN_ON(!(*hvp));
> - if (*hvp) {
> - msr.enable = 1;
> - wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, msr.as_uint64);
> - }
> + if (*hvp)
> + msr.pfn = vmalloc_to_pfn(*hvp);
> +
> + }
> + if (!WARN_ON(!(*hvp))) {
> + msr.enable = 1;
> + wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, msr.as_uint64);
> }
>
> return hyperv_init_ghcb();
> --
> 2.38.1
Powered by blists - more mailing lists