lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <878rkqr7ku.fsf@ovpn-192-136.brq.redhat.com>
Date:   Fri, 04 Nov 2022 17:46:25 +0100
From:   Vitaly Kuznetsov <vkuznets@...hat.com>
To:     "Michael Kelley (LINUX)" <mikelley@...rosoft.com>,
        "linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>
Cc:     KY Srinivasan <kys@...rosoft.com>,
        Haiyang Zhang <haiyangz@...rosoft.com>,
        Wei Liu <wei.liu@...nel.org>, Dexuan Cui <decui@...rosoft.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "x86@...nel.org" <x86@...nel.org>
Subject: RE: [PATCH] x86/hyperv: Restore VP assist page after cpu
 offlining/onlining

"Michael Kelley (LINUX)" <mikelley@...rosoft.com> writes:

> From: Vitaly Kuznetsov <vkuznets@...hat.com> Sent: Thursday, November 3, 2022 12:06 PM
>> 
>> Commit e5d9b714fe40 ("x86/hyperv: fix root partition faults when writing
>> to VP assist page MSR") moved 'wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE)' under
>> 'if (*hvp)' condition. This works for root partition as hv_cpu_die()
>> does memunmap() and sets 'hv_vp_assist_page[cpu]' to NULL but breaks
>> non-root partitions as hv_cpu_die() doesn't free 'hv_vp_assist_page[cpu]'
>> for them. 
>
> Did you consider having hv_cpu_die() free the VP assist page and
> set hv_vp_assist_page[cpu] to NULL in the non-root case?

Oh yes, I did, and I even wrote a patch (attached) but it failed in
testing :-( My testing was to run CPU onlining/offlining loop and and
try using KVM on the same CPU at the same time. I was able to see issues
when KVM was not able to reach to Enlightened VMCS because VP assist
page was NULL.

>  That would
> make the root and non-root cases more consistent, and it would make
> hv_cpu_init() and hv_cpu_die() more symmetrical.   The hv_cpu_die()
> path frees up pretty much all the other per-CPU resources.  I don't
> know why it keeps the VP assist page for re-use if the CPU comes back
> online later.
>
> You added the original code for allocating the vp_assist_page in
> commit a46d15cc1a, so maybe you remember if there are any
> gotchas. :-)

The root cause of the problem I observed seems to be the order of CPU
hotplug. Hyper-V uses the last CPUHP_AP_ONLINE_DYN stage while KVM has
his own dedicated CPUHP_AP_KVM_STARTING one so we end up freeing VP
assist page before turning KVM off on the CPU. It may be sufficient to
introduce a new stage for Hyper-V and put it before KVM's. It's in my
backlog to explore this path but it may take me some time to get back to
it :-( 

>
> Michael
>
>> This causes VP assist page to remain unset after CPU
>> offline/online cycle:
>> 
>> $ rdmsr -p 24 0x40000073
>>   10212f001
>> $ echo 0 > /sys/devices/system/cpu/cpu24/online
>> $ echo 1 > /sys/devices/system/cpu/cpu24/online
>> $ rdmsr -p 24 0x40000073
>>   0
>> 
>> Fix the issue by always writing to HV_X64_MSR_VP_ASSIST_PAGE in
>> hv_cpu_init(). Note, checking 'if (!*hvp)', for root partition is
>> pointless as hv_cpu_die() always sets 'hv_vp_assist_page[cpu]' to
>> NULL (and it's also NULL initially).
>> 
>> Note: the fact that 'hv_vp_assist_page[cpu]' is reset to NULL may
>> present a (potential) issue for KVM. While Hyper-V uses
>> CPUHP_AP_ONLINE_DYN stage in CPU hotplug, KVM uses
>> CPUHP_AP_KVM_STARTING
>> which comes earlier in CPU teardown sequence. It is theoretically
>> possible that Enlightened VMCS is still in use. It is unclear if the
>> issue is real and if using KVM with Hyper-V root partition is even
>> possible.
>> 
>> While on it, drop the unneeded smp_processor_id() call from hv_cpu_init().
>> 
>> Fixes: e5d9b714fe40 ("x86/hyperv: fix root partition faults when writing to VP assist
>> page MSR")
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@...hat.com>
>> ---
>>  arch/x86/hyperv/hv_init.c | 54 +++++++++++++++++++--------------------
>>  1 file changed, 26 insertions(+), 28 deletions(-)
>> 
>> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
>> index f49bc3ec76e6..a269049a43ce 100644
>> --- a/arch/x86/hyperv/hv_init.c
>> +++ b/arch/x86/hyperv/hv_init.c
>> @@ -77,7 +77,7 @@ static int hyperv_init_ghcb(void)
>>  static int hv_cpu_init(unsigned int cpu)
>>  {
>>  	union hv_vp_assist_msr_contents msr = { 0 };
>> -	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[smp_processor_id()];
>> +	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[cpu];
>>  	int ret;
>> 
>>  	ret = hv_common_cpu_init(cpu);
>> @@ -87,34 +87,32 @@ static int hv_cpu_init(unsigned int cpu)
>>  	if (!hv_vp_assist_page)
>>  		return 0;
>> 
>> -	if (!*hvp) {
>> -		if (hv_root_partition) {
>> -			/*
>> -			 * For root partition we get the hypervisor provided VP assist
>> -			 * page, instead of allocating a new page.
>> -			 */
>> -			rdmsrl(HV_X64_MSR_VP_ASSIST_PAGE, msr.as_uint64);
>> -			*hvp = memremap(msr.pfn <<
>> -
>> 	HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_SHIFT,
>> -					PAGE_SIZE, MEMREMAP_WB);
>> -		} else {
>> -			/*
>> -			 * The VP assist page is an "overlay" page (see Hyper-V TLFS's
>> -			 * Section 5.2.1 "GPA Overlay Pages"). Here it must be zeroed
>> -			 * out to make sure we always write the EOI MSR in
>> -			 * hv_apic_eoi_write() *after* the EOI optimization is disabled
>> -			 * in hv_cpu_die(), otherwise a CPU may not be stopped in the
>> -			 * case of CPU offlining and the VM will hang.
>> -			 */
>> +	if (hv_root_partition) {
>> +		/*
>> +		 * For root partition we get the hypervisor provided VP assist
>> +		 * page, instead of allocating a new page.
>> +		 */
>> +		rdmsrl(HV_X64_MSR_VP_ASSIST_PAGE, msr.as_uint64);
>> +		*hvp = memremap(msr.pfn <<
>> HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_SHIFT,
>> +				PAGE_SIZE, MEMREMAP_WB);
>> +	} else {
>> +		/*
>> +		 * The VP assist page is an "overlay" page (see Hyper-V TLFS's
>> +		 * Section 5.2.1 "GPA Overlay Pages"). Here it must be zeroed
>> +		 * out to make sure we always write the EOI MSR in
>> +		 * hv_apic_eoi_write() *after* the EOI optimization is disabled
>> +		 * in hv_cpu_die(), otherwise a CPU may not be stopped in the
>> +		 * case of CPU offlining and the VM will hang.
>> +		 */
>> +		if (!*hvp)
>>  			*hvp = __vmalloc(PAGE_SIZE, GFP_KERNEL | __GFP_ZERO);
>> -			if (*hvp)
>> -				msr.pfn = vmalloc_to_pfn(*hvp);
>> -		}
>> -		WARN_ON(!(*hvp));
>> -		if (*hvp) {
>> -			msr.enable = 1;
>> -			wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, msr.as_uint64);
>> -		}
>> +		if (*hvp)
>> +			msr.pfn = vmalloc_to_pfn(*hvp);
>> +
>> +	}
>> +	if (!WARN_ON(!(*hvp))) {
>> +		msr.enable = 1;
>> +		wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, msr.as_uint64);
>>  	}
>> 
>>  	return hyperv_init_ghcb();
>> --
>> 2.38.1
>

-- 
Vitaly


View attachment "0001-x86-hyperv-Free-VP-assist-page-from-hv_cpu_die.patch" of type "text/x-patch" (3077 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ