linux-kernel - Re: [PATCH v2] hv: retry infinitely on hypercall transient failures

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170107074233.GB18087@kroah.com>
Date:   Sat, 7 Jan 2017 08:42:33 +0100
From:   Greg KH <greg@...ah.com>
To:     Long Li <longli@...rosoft.com>
Cc:     KY Srinivasan <kys@...rosoft.com>,
        Haiyang Zhang <haiyangz@...rosoft.com>,
        "devel@...uxdriverproject.org" <devel@...uxdriverproject.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2] hv: retry infinitely on hypercall transient failures

On Sat, Jan 07, 2017 at 07:23:14AM +0000, Long Li wrote:
> > -----Original Message-----
> > From: Greg KH [mailto:greg@...ah.com]
> > Sent: Wednesday, January 04, 2017 11:48 PM
> > To: Long Li <longli@...rosoft.com>
> > Cc: KY Srinivasan <kys@...rosoft.com>; Haiyang Zhang
> > <haiyangz@...rosoft.com>; devel@...uxdriverproject.org; linux-
> > kernel@...r.kernel.org
> > Subject: Re: [PATCH v2] hv: retry infinitely on hypercall transient failures
> > 
> > On Wed, Jan 04, 2017 at 06:12:20PM -0800, Long Li wrote:
> > > From: Long Li <longli@...rosoft.com>
> > >
> > > Hyper-v host guarantees that a hypercall will finish in reasonable time.
> > > Retry infinitely on transient failures to avoid returning error to upper layer.
> > 
> > Again, never retry "forever", always have a way out, otherwise you will crash.
> > 
> > And again, why are you making this change?  What problem does it solve?
> 
> The problem it tries to solve is that in this code we are returning
> error prematurely on transient failures. The hypercall is used mostly
> in channel establishment. If we return a transient failure, the VM may
> not boot or not useful after boot due to some devices missing.
> 
> Another approach is to increase the number of retries. But we don't
> know how many retries is safe, and Windows host side expects the guest
> retry infinitely and not return error on transient failures.

That implies a lot of trust in the host side, don't you think?

Worse case, make the delay a minute or so, but give the system a way out
incase there's a bug in the host.  As there will be bugs in the host,
just like there are bugs in the client :)

thanks,

greg k-h