[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8761boxjmr.fsf@vitty.brq.redhat.com>
Date: Fri, 30 Jan 2015 18:29:00 +0100
From: Vitaly Kuznetsov <vkuznets@...hat.com>
To: Dexuan Cui <decui@...rosoft.com>
Cc: "gregkh\@linuxfoundation.org" <gregkh@...uxfoundation.org>,
"linux-kernel\@vger.kernel.org" <linux-kernel@...r.kernel.org>,
"driverdev-devel\@linuxdriverproject.org"
<driverdev-devel@...uxdriverproject.org>,
"olaf\@aepfle.de" <olaf@...fle.de>,
"apw\@canonical.com" <apw@...onical.com>,
"jasowang\@redhat.com" <jasowang@...hat.com>,
KY Srinivasan <kys@...rosoft.com>,
Haiyang Zhang <haiyangz@...rosoft.com>
Subject: Re: [PATCH 2/3] hv: vmbus_post_msg: retry the hypercall on HV_STATUS_INVALID_CONNECTION_ID
Dexuan Cui <decui@...rosoft.com> writes:
>> -----Original Message-----
>> From: Vitaly Kuznetsov [mailto:vkuznets@...hat.com]
>> Sent: Thursday, January 29, 2015 21:31 PM
>> To: Dexuan Cui
>> Cc: gregkh@...uxfoundation.org; linux-kernel@...r.kernel.org; driverdev-
>> devel@...uxdriverproject.org; olaf@...fle.de; apw@...onical.com;
>> jasowang@...hat.com; KY Srinivasan; Haiyang Zhang
>> Subject: Re: [PATCH 2/3] hv: vmbus_post_msg: retry the hypercall on
>> HV_STATUS_INVALID_CONNECTION_ID
>>
>> Dexuan Cui <decui@...rosoft.com> writes:
>>
>> > I got the hypercall error code on Hyper-V 2008 R2 when keeping running
>> > "rmmod hv_netvsc; modprobe hv_netvsc; rmmod hv_utils; modprobe hv_utils"
>> > in a Linux guest.
>> >
>> > Without the patch, the driver can occasionally fail to load.
>> >
>> > CC: "K. Y. Srinivasan" <kys@...rosoft.com>
>> > Signed-off-by: Dexuan Cui <decui@...rosoft.com>
>> > ---
>> > arch/x86/include/uapi/asm/hyperv.h | 1 +
>> > drivers/hv/connection.c | 9 +++++++++
>> > 2 files changed, 10 insertions(+)
>> >
>> > diff --git a/arch/x86/include/uapi/asm/hyperv.h
>> b/arch/x86/include/uapi/asm/hyperv.h
>> > index 90c458e..b9daffb 100644
>> > --- a/arch/x86/include/uapi/asm/hyperv.h
>> > +++ b/arch/x86/include/uapi/asm/hyperv.h
>> > @@ -225,6 +225,7 @@
>> > #define HV_STATUS_INVALID_HYPERCALL_CODE 2
>> > #define HV_STATUS_INVALID_HYPERCALL_INPUT 3
>> > #define HV_STATUS_INVALID_ALIGNMENT 4
>> > +#define HV_STATUS_INVALID_CONNECTION_ID 18
>> > #define HV_STATUS_INSUFFICIENT_BUFFERS 19
>>
>> The gap beween 4 and 18 tells me there are other codes here ;-) Are they
>> all 'permanent failures'?
> It looks we only need to care about these error codes here.
>
> BTW, you can get all the hypercall error codes in the top level functional spec:
> http://blogs.msdn.com/b/virtual_pc_guy/archive/2014/02/17/updated-hypervisor-top-level-functional-specification.aspx
> For this hypercall (0x005c), see "14.9.7 HvPostMessage".
Thanks, interesting!
Btw, HV_STATUS_INSUFFICIENT_MEMORY looks suspicious, looks like we can
hit it as well...
I suggest we split all failures here in 2 classes:
1) permanent
2) worth retrying
and treat them accordingly (no big changes, just maybe group them within
hv_post_message() together as it is the only place where these codes are
being used).
>
>> >
>> > typedef struct _HV_REFERENCE_TSC_PAGE {
>> > diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
>> > index c4acd1c..8bd05f3 100644
>> > --- a/drivers/hv/connection.c
>> > +++ b/drivers/hv/connection.c
>> > @@ -440,6 +440,15 @@ int vmbus_post_msg(void *buffer, size_t buflen)
>> > ret = hv_post_message(conn_id, 1, buffer, buflen);
>> >
>> > switch (ret) {
>> > + case HV_STATUS_INVALID_CONNECTION_ID:
>> > + /*
>> > + * We could get this if we send messages too
>> > + * frequently or the host is under low resource
>> > + * conditions: let's wait 1 more second before
>> > + * retrying the hypercall.
>> > + */
>> > + msleep(1000);
>> > + break;
>>
>> In case it is our last try (No. 10) we will return '18' from the
>> function. I suggest we set ret = -ENOMEM here as well.
> Thanks for the suggestion!
>
> I think it would be better to add this to the case
> HV_STATUS_INVALID_CONNECTION_ID:
> ret = -EAGAIN;
> ?
Yes, like fallthrough
>
>> > case HV_STATUS_INSUFFICIENT_BUFFERS:
>> > ret = -ENOMEM;
>>
>> Or should we treat these two equally? There is a smaller (100ms) sleep
>> between tries already, we can consider changing it instead.
>>
>> > case -ENOMEM:
>>
>> --
>> Vitaly
> In my experiments, in the HV_STATUS_INVALID_CONNECTION_ID case,
> waiting 100ms is not enough sometimes, so I'd like to wait more time.
> I agree with you both cases can wait 1000ms. I'll update my patch.
>
> BTW, the " case -ENOMEM:" is not reachable(the hypervisor itself doesn't
> return -ENOMEM), I think. I can remove it.
hv_post_message() can return -EMSGSIZE or do_hypercall() return value
(which becomes u16 in hv_post_message()). So yes, I agree, -ENOMEM is
not possible.
>
> Thanks,
> -- Dexuan
--
Vitaly
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists