[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87mvpq7gtg.fsf@vitty.brq.redhat.com>
Date: Tue, 22 Mar 2016 15:00:59 +0100
From: Vitaly Kuznetsov <vkuznets@...hat.com>
To: KY Srinivasan <kys@...rosoft.com>
Cc: "devel\@linuxdriverproject.org" <devel@...uxdriverproject.org>,
"linux-kernel\@vger.kernel.org" <linux-kernel@...r.kernel.org>,
Haiyang Zhang <haiyangz@...rosoft.com>,
"Alex Ng \(LIS\)" <alexng@...rosoft.com>,
"Radim Krcmar" <rkrcmar@...hat.com>,
Cathy Avery <cavery@...hat.com>
Subject: Re: [PATCH] Drivers: hv: vmbus: handle various crash scenarios
KY Srinivasan <kys@...rosoft.com> writes:
>> -----Original Message-----
>> From: Vitaly Kuznetsov [mailto:vkuznets@...hat.com]
>> Sent: Monday, March 21, 2016 12:52 AM
>> To: KY Srinivasan <kys@...rosoft.com>
>> Cc: devel@...uxdriverproject.org; linux-kernel@...r.kernel.org; Haiyang
>> Zhang <haiyangz@...rosoft.com>; Alex Ng (LIS) <alexng@...rosoft.com>;
>> Radim Krcmar <rkrcmar@...hat.com>; Cathy Avery <cavery@...hat.com>
>> Subject: Re: [PATCH] Drivers: hv: vmbus: handle various crash scenarios
>>
>> KY Srinivasan <kys@...rosoft.com> writes:
>>
>> >> -----Original Message-----
>> >> From: Vitaly Kuznetsov [mailto:vkuznets@...hat.com]
>> >> Sent: Friday, March 18, 2016 5:33 AM
>> >> To: devel@...uxdriverproject.org
>> >> Cc: linux-kernel@...r.kernel.org; KY Srinivasan <kys@...rosoft.com>;
>> >> Haiyang Zhang <haiyangz@...rosoft.com>; Alex Ng (LIS)
>> >> <alexng@...rosoft.com>; Radim Krcmar <rkrcmar@...hat.com>; Cathy
>> >> Avery <cavery@...hat.com>
>> >> Subject: [PATCH] Drivers: hv: vmbus: handle various crash scenarios
>> >>
>> >> Kdump keeps biting. Turns out CHANNELMSG_UNLOAD_RESPONSE is
>> always
>> >> delivered to CPU0 regardless of what CPU we're sending
>> >> CHANNELMSG_UNLOAD
>> >> from. vmbus_wait_for_unload() doesn't account for the fact that in case
>> >> we're crashing on some other CPU and CPU0 is still alive and operational
>> >> CHANNELMSG_UNLOAD_RESPONSE will be delivered there completing
>> >> vmbus_connection.unload_event, our wait on the current CPU will never
>> >> end.
>> >
>> > What was the host you were testing on?
>> >
>>
>> I was testing on both 2012R2 and 2016TP4. The bug is easily reproducible
>> by forcing crash on a secondary CPU, e.g.:
>
> Prior to 2012R2, all messages would be delivered on CPU0 and this includes CHANNELMSG_UNLOAD_RESPONSE.
> For this reason we don't support kexec on pre-2012 R2 hosts. On 2012. From 2012 R2 on, all vmbus
> messages (responses) will be delivered on the CPU that we initially set up - look at the code in
> vmbus_negotiate_version(). So on post 2012 R2 hosts, the response to CHANNELMSG_UNLOAD_RESPONSE
> will be delivered on the CPU where we initiate the contact with the
> host - CHANNELMSG_INITIATE_CONTACT message.
Unfortunatelly there is a descrepancy between WS2012R2 and WS2016TP4. On
WS2012R2 what you're saying is true and all messages including
CHANNELMSG_UNLOAD_RESPONSE are delivered to the CPU we used for initial
contact. On WS2016TP4 CHANNELMSG_UNLOAD_RESPONSE seems to be a special
case and it is always delivered to CPU0, no matter which CPU we used for
initial contact. This can be a host bug. You can use the attached patch
to see the issue.
For now I can suggest we check message pages for all CPUs from
vmbus_wait_for_unload(). We can race with other CPUs again but we don't
care as we're checking for completion_done() in the loop as well. I'll
try this approach.
--
Vitaly
View attachment "0001-Drivers-hv-vmbus-handle-various-crash-scenarios.patch" of type "text/x-patch" (6177 bytes)
Powered by blists - more mailing lists