[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <SN2PR03MB2142A6D723E95FCE7193CFE4A0800@SN2PR03MB2142.namprd03.prod.outlook.com>
Date: Tue, 22 Mar 2016 14:18:05 +0000
From: KY Srinivasan <kys@...rosoft.com>
To: Vitaly Kuznetsov <vkuznets@...hat.com>
CC: "devel@...uxdriverproject.org" <devel@...uxdriverproject.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Haiyang Zhang <haiyangz@...rosoft.com>,
"Alex Ng (LIS)" <alexng@...rosoft.com>,
"Radim Krcmar" <rkrcmar@...hat.com>,
Cathy Avery <cavery@...hat.com>
Subject: RE: [PATCH] Drivers: hv: vmbus: handle various crash scenarios
> -----Original Message-----
> From: Vitaly Kuznetsov [mailto:vkuznets@...hat.com]
> Sent: Tuesday, March 22, 2016 7:01 AM
> To: KY Srinivasan <kys@...rosoft.com>
> Cc: devel@...uxdriverproject.org; linux-kernel@...r.kernel.org; Haiyang
> Zhang <haiyangz@...rosoft.com>; Alex Ng (LIS) <alexng@...rosoft.com>;
> Radim Krcmar <rkrcmar@...hat.com>; Cathy Avery <cavery@...hat.com>
> Subject: Re: [PATCH] Drivers: hv: vmbus: handle various crash scenarios
>
> KY Srinivasan <kys@...rosoft.com> writes:
>
> >> -----Original Message-----
> >> From: Vitaly Kuznetsov [mailto:vkuznets@...hat.com]
> >> Sent: Monday, March 21, 2016 12:52 AM
> >> To: KY Srinivasan <kys@...rosoft.com>
> >> Cc: devel@...uxdriverproject.org; linux-kernel@...r.kernel.org; Haiyang
> >> Zhang <haiyangz@...rosoft.com>; Alex Ng (LIS)
> <alexng@...rosoft.com>;
> >> Radim Krcmar <rkrcmar@...hat.com>; Cathy Avery
> <cavery@...hat.com>
> >> Subject: Re: [PATCH] Drivers: hv: vmbus: handle various crash scenarios
> >>
> >> KY Srinivasan <kys@...rosoft.com> writes:
> >>
> >> >> -----Original Message-----
> >> >> From: Vitaly Kuznetsov [mailto:vkuznets@...hat.com]
> >> >> Sent: Friday, March 18, 2016 5:33 AM
> >> >> To: devel@...uxdriverproject.org
> >> >> Cc: linux-kernel@...r.kernel.org; KY Srinivasan <kys@...rosoft.com>;
> >> >> Haiyang Zhang <haiyangz@...rosoft.com>; Alex Ng (LIS)
> >> >> <alexng@...rosoft.com>; Radim Krcmar <rkrcmar@...hat.com>;
> Cathy
> >> >> Avery <cavery@...hat.com>
> >> >> Subject: [PATCH] Drivers: hv: vmbus: handle various crash scenarios
> >> >>
> >> >> Kdump keeps biting. Turns out CHANNELMSG_UNLOAD_RESPONSE is
> >> always
> >> >> delivered to CPU0 regardless of what CPU we're sending
> >> >> CHANNELMSG_UNLOAD
> >> >> from. vmbus_wait_for_unload() doesn't account for the fact that in
> case
> >> >> we're crashing on some other CPU and CPU0 is still alive and
> operational
> >> >> CHANNELMSG_UNLOAD_RESPONSE will be delivered there
> completing
> >> >> vmbus_connection.unload_event, our wait on the current CPU will
> never
> >> >> end.
> >> >
> >> > What was the host you were testing on?
> >> >
> >>
> >> I was testing on both 2012R2 and 2016TP4. The bug is easily reproducible
> >> by forcing crash on a secondary CPU, e.g.:
> >
> > Prior to 2012R2, all messages would be delivered on CPU0 and this includes
> CHANNELMSG_UNLOAD_RESPONSE.
> > For this reason we don't support kexec on pre-2012 R2 hosts. On 2012.
> From 2012 R2 on, all vmbus
> > messages (responses) will be delivered on the CPU that we initially set up -
> look at the code in
> > vmbus_negotiate_version(). So on post 2012 R2 hosts, the response to
> CHANNELMSG_UNLOAD_RESPONSE
> > will be delivered on the CPU where we initiate the contact with the
> > host - CHANNELMSG_INITIATE_CONTACT message.
>
> Unfortunatelly there is a descrepancy between WS2012R2 and WS2016TP4.
> On
> WS2012R2 what you're saying is true and all messages including
> CHANNELMSG_UNLOAD_RESPONSE are delivered to the CPU we used for
> initial
> contact. On WS2016TP4 CHANNELMSG_UNLOAD_RESPONSE seems to be a
> special
> case and it is always delivered to CPU0, no matter which CPU we used for
> initial contact. This can be a host bug. You can use the attached patch
> to see the issue.
This looks like a host bug and I will try to get is addressed before ws2016
ships.
>
> For now I can suggest we check message pages for all CPUs from
> vmbus_wait_for_unload(). We can race with other CPUs again but we don't
> care as we're checking for completion_done() in the loop as well. I'll
> try this approach.
Thank you.
K. Y
>
> --
> Vitaly
Powered by blists - more mailing lists