linux-kernel - Re: [PATCH 2/3]: Staging: hv: Use native wait primitives

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Tue, 15 Feb 2011 08:29:33 -0800
From:	Greg KH <gregkh@...e.de>
To:	KY Srinivasan <kys@...rosoft.com>
Cc:	Jiri Slaby <jirislaby@...il.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"devel@...uxdriverproject.org" <devel@...uxdriverproject.org>,
	"virtualization@...ts.osdl.org" <virtualization@...ts.osdl.org>
Subject: Re: [PATCH 2/3]: Staging: hv: Use native wait primitives

On Tue, Feb 15, 2011 at 04:22:20PM +0000, KY Srinivasan wrote:
> 
> 
> > -----Original Message-----
> > From: Greg KH [mailto:gregkh@...e.de]
> > Sent: Tuesday, February 15, 2011 9:03 AM
> > To: KY Srinivasan
> > Cc: Jiri Slaby; linux-kernel@...r.kernel.org; devel@...uxdriverproject.org;
> > virtualization@...ts.osdl.org
> > Subject: Re: [PATCH 2/3]: Staging: hv: Use native wait primitives
> > 
> > On Tue, Feb 15, 2011 at 01:35:56PM +0000, KY Srinivasan wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: Jiri Slaby [mailto:jirislaby@...il.com]
> > > > Sent: Tuesday, February 15, 2011 4:21 AM
> > > > To: KY Srinivasan
> > > > Cc: gregkh@...e.de; linux-kernel@...r.kernel.org;
> > > > devel@...uxdriverproject.org; virtualization@...ts.osdl.org
> > > > Subject: Re: [PATCH 2/3]: Staging: hv: Use native wait primitives
> > > >
> > > > On 02/11/2011 06:59 PM, K. Y. Srinivasan wrote:
> > > > > In preperation for getting rid of the osd layer; change
> > > > > the code to use native wait interfaces. As part of this,
> > > > > fixed the buggy implementation in the osd_wait_primitive
> > > > > where the condition was cleared potentially after the
> > > > > condition was signalled.
> > > > ...
> > > > > @@ -566,7 +567,11 @@ int vmbus_establish_gpadl(struct vmbus_channel
> > > > *channel, void *kbuffer,
> > > > >
> > > > >  		}
> > > > >  	}
> > > > > -	osd_waitevent_wait(msginfo->waitevent);
> > > > > +	wait_event_timeout(msginfo->waitevent,
> > > > > +				msginfo->wait_condition,
> > > > > +				msecs_to_jiffies(1000));
> > > > > +	BUG_ON(msginfo->wait_condition == 0);
> > > >
> > > > The added BUG_ONs all over the code look scary. These shouldn't be
> > > > BUG_ONs at all. You should maybe warn and bail out, but not kill the
> > > > whole machine.
> > >
> > > This is Linux code running as a guest on a Windows host; and so the guest
> > cannot
> > > tolerate a failure of the host. In the cases where I have chosen to BUG_ON,
> > there
> > > is no reasonable recovery possible when the host is non-functional (as
> > determined
> > > by a non-responsive host).
> > 
> > If you have a non-responsive host, wouldn't that imply that this guest
> > code wouldn't run at all?  :)
> 
> The fact  that on a particular transaction the host has not responded within an expected
> time interval does not necessarily  mean that the guest code would not be running. There may be 
> issues on the host side that may be either transient or permanent that may cause problems like
> this. Keep in mind, HyperV is a type 1 hypervisor that would schedule all VMs including the host
> and so, guest would get scheduled.
> 
> > 
> > Having BUG_ON() in drivers is not a good idea either way.  Please remove
> > these in future patches.
> 
> In situations where there is not a reasonable rollback strategy (for
> instance in one of the cases, we are granting access to the guest
> physical pages to the host) we really have only 2 options:
> 
> 1) Wait until the host responds. This wait could potentially be unbounded
> and in fact this  was the way the code was to begin with. One of the reviewers
> had suggested that unbounded wait was to be corrected.
> 2) Wait for a specific period and if the host does not respond
> within a reasonable period, kill the guest since there is no recovery
> possible.

Killing the guest is a very serious thing, causing all sorts of possible
problems with it, right?

> I chose option 2, as part of addressing some of the prior review
> comments. If the consensus now is to go back to option 1, I am fine with that;

Unbounded waits aren't ok either, you need some sort of timeout.

But, as this is a bit preferable to dieing, I suggest doing this, and
comment the heck out of it to explain all of this for anyone who reads
it.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/