lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 3 Jun 2013 23:47:22 +0000
From:	James Bottomley <jbottomley@...allels.com>
To:	KY Srinivasan <kys@...rosoft.com>
CC:	"gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"devel@...uxdriverproject.org" <devel@...uxdriverproject.org>,
	"ohering@...e.com" <ohering@...e.com>,
	"hch@...radead.org" <hch@...radead.org>,
	"linux-scsi@...r.kernel.org" <linux-scsi@...r.kernel.org>
Subject: Re: [PATCH 1/5] Drivers: scsi: storvsc: Make the scsi timeout a
 module parameter

On Mon, 2013-06-03 at 23:25 +0000, KY Srinivasan wrote:
> 
> > -----Original Message-----
> > From: James Bottomley [mailto:jbottomley@...allels.com]
> > Sent: Monday, June 03, 2013 7:03 PM
> > To: KY Srinivasan
> > Cc: gregkh@...uxfoundation.org; linux-kernel@...r.kernel.org;
> > devel@...uxdriverproject.org; ohering@...e.com; hch@...radead.org; linux-
> > scsi@...r.kernel.org
> > Subject: Re: [PATCH 1/5] Drivers: scsi: storvsc: Make the scsi timeout a module
> > parameter
> > 
> > On Mon, 2013-06-03 at 16:21 -0700, K. Y. Srinivasan wrote:
> > > The standard scsi timeout is not appropriate in some of the environments
> > where
> > > Hyper-V is deployed. Set this timeout appropriately for all devices managed
> > > by this driver. Further make this a module parameter.
> > >
> > > Signed-off-by: K. Y. Srinivasan <kys@...rosoft.com>
> > > Reviewed-by: Haiyang Zhang <haiyangz@...rosoft.com>
> > > ---
> > >  drivers/scsi/storvsc_drv.c |    9 +++++++++
> > >  1 files changed, 9 insertions(+), 0 deletions(-)
> > >
> > > diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
> > > index 16a3a0c..8d29a95 100644
> > > --- a/drivers/scsi/storvsc_drv.c
> > > +++ b/drivers/scsi/storvsc_drv.c
> > > @@ -221,6 +221,13 @@ static int storvsc_ringbuffer_size = (20 * PAGE_SIZE);
> > >  module_param(storvsc_ringbuffer_size, int, S_IRUGO);
> > >  MODULE_PARM_DESC(storvsc_ringbuffer_size, "Ring buffer size (bytes)");
> > >
> > > +/*
> > > + * Timeout in seconds for all devices managed by this driver.
> > > + */
> > > +static int storvsc_timeout = 180;
> > > +module_param(storvsc_timeout, uint, (S_IRUGO | S_IWUSR));
> > > +MODULE_PARM_DESC(storvsc_timeout, "Device timeout (seconds)");
> > > +
> > >  #define STORVSC_MAX_IO_REQUESTS				128
> > >
> > >  /*
> > > @@ -1204,6 +1211,8 @@ static int storvsc_device_configure(struct scsi_device
> > *sdevice)
> > >
> > >  	blk_queue_bounce_limit(sdevice->request_queue, BLK_BOUNCE_ANY);
> > >
> > > +	blk_queue_rq_timeout(sdevice->request_queue, (storvsc_timeout *
> > HZ));
> > 
> > Why does this need to be a module parameter?  It's already a sysfs one
> > in the scsi_device class?  Three minutes is also a bit large.  The
> > default is 30s with huge cache arrays recommending upping this to
> > 60s ... you're three times this.
> 
> James,
> This number was arrived at based on some testing that was done on the
> cloud. On our cloud, we have a  120 second
> timeouts that trigger broader VM level recovery  and in cases where
> there is storage access issues
> (which is when we would hit this timeout), it will be better to defer
> to the fabric level recovery than attempt
> Scsi level recovery/retry.  The default value chosen for devices
> managed by storvsc should be just fine, 

So are you sure you want to set the command timeout to 3 minutes? ...
it's an incredibly high value.  The actual complete timeout is this
value multiplied by the number of retries, which is 5 for disk devices,
so you'll be waiting up to 15 minutes before we signal a failure in some
circumstances.  It sounds like you want the actual path length of error
recovery to be on average 3 minutes.

The value of the timeout should be a compromise between the longest time
you want the user to wait for a failure and the longest time a device
should take to respond.

> I made it a module parameter to have more flexibility.

It's *already* a sysfs parameter ... why do you want an additional
module parameter?  Multiple parameters for the same quantity, especially
ones which can't be altered at runtime like module parameters, end up
confusing users.

James

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ