[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <1684422832-38476-1-git-send-email-mikelley@microsoft.com>
Date: Thu, 18 May 2023 08:13:52 -0700
From: Michael Kelley <mikelley@...rosoft.com>
To: kys@...rosoft.com, haiyangz@...rosoft.com, wei.liu@...nel.org,
linux-kernel@...r.kernel.org, linux-hyperv@...r.kernel.org,
vkuznets@...hat.com, decui@...rosoft.com
Cc: mikelley@...rosoft.com, stable@...r.kernel.org
Subject: [PATCH v2 1/1] Drivers: hv: vmbus: Fix vmbus_wait_for_unload() to scan present CPUs
vmbus_wait_for_unload() may be called in the panic path after other
CPUs are stopped. vmbus_wait_for_unload() currently loops through
online CPUs looking for the UNLOAD response message. But the values of
CONFIG_KEXEC_CORE and crash_kexec_post_notifiers affect the path used
to stop the other CPUs, and in one of the paths the stopped CPUs
are removed from cpu_online_mask. This removal happens in both
x86/x64 and arm64 architectures. In such a case, vmbus_wait_for_unload()
only checks the panic'ing CPU, and misses the UNLOAD response message
except when the panic'ing CPU is CPU 0. vmbus_wait_for_unload()
eventually times out, but only after waiting 100 seconds.
Fix this by looping through *present* CPUs in vmbus_wait_for_unload().
The cpu_present_mask is not modified by stopping the other CPUs in the
panic path, nor should it be.
Also, in a CoCo VM the synic_message_page is not allocated in
hv_synic_alloc(), but is set and cleared in hv_synic_enable_regs()
and hv_synic_disable_regs() such that it is set only when the CPU is
online. If not all present CPUs are online when vmbus_wait_for_unload()
is called, the synic_message_page might be NULL. Add a check for this.
Fixes: cd95aad55793 ("Drivers: hv: vmbus: handle various crash scenarios")
Cc: stable@...r.kernel.org
Reported-by: John Starks <jostarks@...rosoft.com>
Signed-off-by: Michael Kelley <mikelley@...rosoft.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@...hat.com>
---
Changes in v2:
* Added a comment and updated commit messages to describe scenario
where synic_message_page might be NULL
* Added Cc: stable@...r.kernel.org
drivers/hv/channel_mgmt.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)
diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index 007f26d..2f4d09c 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -829,11 +829,22 @@ static void vmbus_wait_for_unload(void)
if (completion_done(&vmbus_connection.unload_event))
goto completed;
- for_each_online_cpu(cpu) {
+ for_each_present_cpu(cpu) {
struct hv_per_cpu_context *hv_cpu
= per_cpu_ptr(hv_context.cpu_context, cpu);
+ /*
+ * In a CoCo VM the synic_message_page is not allocated
+ * in hv_synic_alloc(). Instead it is set/cleared in
+ * hv_synic_enable_regs() and hv_synic_disable_regs()
+ * such that it is set only when the CPU is online. If
+ * not all present CPUs are online, the message page
+ * might be NULL, so skip such CPUs.
+ */
page_addr = hv_cpu->synic_message_page;
+ if (!page_addr)
+ continue;
+
msg = (struct hv_message *)page_addr
+ VMBUS_MESSAGE_SINT;
@@ -867,11 +878,14 @@ static void vmbus_wait_for_unload(void)
* maybe-pending messages on all CPUs to be able to receive new
* messages after we reconnect.
*/
- for_each_online_cpu(cpu) {
+ for_each_present_cpu(cpu) {
struct hv_per_cpu_context *hv_cpu
= per_cpu_ptr(hv_context.cpu_context, cpu);
page_addr = hv_cpu->synic_message_page;
+ if (!page_addr)
+ continue;
+
msg = (struct hv_message *)page_addr + VMBUS_MESSAGE_SINT;
msg->header.message_type = HVMSG_NONE;
}
--
1.8.3.1
Powered by blists - more mailing lists