[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <MWHPR21MB15935468547C25294A253E0AD7F39@MWHPR21MB1593.namprd21.prod.outlook.com>
Date: Fri, 6 Aug 2021 21:51:35 +0000
From: Michael Kelley <mikelley@...rosoft.com>
To: תומר אבוטבו <tomer432100@...il.com>,
David Mozes <david.mozes@...k.us>
CC: David Moses <mosesster@...il.com>,
"linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH] x86/hyper-v: guard against cpu mask changes in
hyperv_flush_tlb_others()
From: תומר אבוטבול <tomer432100@...il.com> Sent: Friday, August 6, 2021 11:03 AM
> Attaching the patches Michael asked for debugging
> 1) Print the cpumask when < num_possible_cpus():
> diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
> index e666f7eaf32d..620f656d6195 100644
> --- a/arch/x86/hyperv/mmu.c
> +++ b/arch/x86/hyperv/mmu.c
> @@ -60,6 +60,7 @@ static void hyperv_flush_tlb_others(const struct cpumask *cpus,
> struct hv_tlb_flush *flush;
> u64 status = U64_MAX;
> unsigned long flags;
> + unsigned int cpu_last;
>
> trace_hyperv_mmu_flush_tlb_others(cpus, info);
>
> @@ -68,6 +69,11 @@ static void hyperv_flush_tlb_others(const struct cpumask *cpus,
>
> local_irq_save(flags);
>
> + cpu_last = cpumask_last(cpus);
> + if (cpu_last > num_possible_cpus()) {
I think this should be ">=" since cpus are numbered starting at zero.
In your VM with 64 CPUs, having CPU #64 in the list would be error.
> + pr_emerg("ERROR_HYPERV: cpu_last=%*pbl", cpumask_pr_args(cpus));
> + }
> +
> /*
> * Only check the mask _after_ interrupt has been disabled to avoid the
> * mask changing under our feet.
>
> 2) disable the Hyper-V specific flush routines:
> diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
> index e666f7eaf32d..8e77cc84775a 100644
> --- a/arch/x86/hyperv/mmu.c
> +++ b/arch/x86/hyperv/mmu.c
> @@ -235,6 +235,7 @@ static u64 hyperv_flush_tlb_others_ex(const struct cpumask *cpus,
>
> void hyperv_setup_mmu_ops(void)
> {
> + return;
> if (!(ms_hyperv.hints & HV_X64_REMOTE_TLB_FLUSH_RECOMMENDED))
> return;
Otherwise, this code looks good to me and matches what I had in mind.
Note that the function native_flush_tlb_others() is used when the Hyper-V specific
flush function is disabled per patch #2 above, or when hv_cpu_to_vp_index() returns
VP_INVALID. In a quick glance through the code, it appears that native_flush_tlb_others()
will work even if there's a non-existent CPU in the cpumask that is passed as an argument.
So perhaps an immediate workaround is Patch #2 above.
Perhaps hyperv_flush_tlb_others() should be made equally tolerant of a non-existent
CPU being in the list. But if you are willing, I'm still interested in the results of an
experiment with just Patch #1. I'm curious about what the CPU list looks like when
it has a non-existent CPU. Is it complete garbage, or is there just one non-existent
CPU?
The other curiosity is that I haven't seen this Linux panic reported by other users,
and I think it would have come to our attention if it were happening with any frequency.
You see the problem fairly regularly. So I'm wondering what the difference is.
Michael
Powered by blists - more mailing lists