[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <DM5PR03MB24908077881B2286EF00E21AA0AE0@DM5PR03MB2490.namprd03.prod.outlook.com>
Date: Mon, 31 Oct 2016 15:14:02 +0000
From: KY Srinivasan <kys@...rosoft.com>
To: Vitaly Kuznetsov <vkuznets@...hat.com>
CC: "devel@...uxdriverproject.org" <devel@...uxdriverproject.org>,
"Van De Ven, Arjan" <arjan.van.de.ven@...el.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Haiyang Zhang <haiyangz@...rosoft.com>
Subject: RE: [PATCH] Drivers: hv: vmbus: Raise retry/wait limits in
vmbus_post_msg()
> -----Original Message-----
> From: Vitaly Kuznetsov [mailto:vkuznets@...hat.com]
> Sent: Monday, October 31, 2016 3:05 AM
> To: KY Srinivasan <kys@...rosoft.com>
> Cc: devel@...uxdriverproject.org; Van De Ven, Arjan
> <arjan.van.de.ven@...el.com>; linux-kernel@...r.kernel.org; Haiyang Zhang
> <haiyangz@...rosoft.com>
> Subject: Re: [PATCH] Drivers: hv: vmbus: Raise retry/wait limits in
> vmbus_post_msg()
>
> KY Srinivasan <kys@...rosoft.com> writes:
>
> >> -----Original Message-----
> >> From: Vitaly Kuznetsov [mailto:vkuznets@...hat.com]
> >> Sent: Wednesday, October 26, 2016 4:12 AM
> >> To: devel@...uxdriverproject.org
> >> Cc: linux-kernel@...r.kernel.org; KY Srinivasan <kys@...rosoft.com>;
> >> Haiyang Zhang <haiyangz@...rosoft.com>
> >> Subject: [PATCH] Drivers: hv: vmbus: Raise retry/wait limits in
> >> vmbus_post_msg()
> >>
> >> DoS protection conditions were altered in WS2016 and now it's easy to get
> >> -EAGAIN returned from vmbus_post_msg() (e.g. when we try changing
> MTU
> >> on a
> >> netvsc device in a loop). All vmbus_post_msg() callers don't retry the
> >> operation and we usually end up with a non-functional device or crash.
> >>
> >> While host's DoS protection conditions are unknown to me my tests show
> >> that
> >> it can take up to 46 attempts to send a message after changing udelay() to
> >> mdelay() and caping msec at '256', this means we can wait up to 10
> seconds
> >> before the message is sent so we need to use msleep() instead. Almost all
> >> vmbus_post_msg() callers are ready to sleep but there is one special case:
> >> vmbus_initiate_unload() which can be called from interrupt/NMI context
> >> and
> >> we can't sleep there. I'm also not sure about the lonely
> >> vmbus_send_tl_connect_request() which has no in-tree users but its
> >> external
> >> users are most likely waiting for the host to reply so sleeping there is
> >> also appropriate.
> >
> > Vitaly,
> >
> > One of the reasons why the delay was in microseconds was to make sure
> that the boot time
> > was not adversely affected by the delay we had in setting up the channel.
> The change to microsecond
> > delay and other changes in this code reduced the time it took to initialize
> netvsc from
> > 200 milliseconds to about 12 milliseconds. This is important for us as we look
> at achieving sub-second
> > boot times.
> > The situation you are trying to address are test cases where you are hitting
> the host with
> > requests that triggers hosts DOS prevention code. Perhaps we could have a
> hybrid approach: we
> > retain microsecond wait until we hit a threshold and then we use
> millisecond delays. This way, the normal boot
> > path is still fast while we can handle some of the other cases where the host
> DOS prevention code kicks in.
> >
>
> Ok,
>
> I actually tested boot time with my patch and didn't see a difference
> (so I guess our first attempt to send messages usually succeeds) but if
> we're concearned about less-than-a-second boot time we'd rather keep the
> microseonds delay for first several attempts. I'll do v2.
Thank you.
K. Y
>
> Thanks,
>
>
> --
> Vitaly
Powered by blists - more mailing lists