linux-kernel - Re: Drivers: scsi: FLUSH timeout

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CADOQvuMY6NMY4U9jpbR+oi5N6=bLGFez0d5_PEf-G85_VWMZuA@mail.gmail.com>
Date:	Fri, 4 Oct 2013 11:12:34 -0700
From:	Eric Seppanen <eric@...estorage.com>
To:	emilne@...hat.com
Cc:	"Nicholas A. Bellinger" <nab@...ux-iscsi.org>,
	KY Srinivasan <kys@...rosoft.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"devel@...uxdriverproject.org" <devel@...uxdriverproject.org>,
	"linux-scsi@...r.kernel.org" <linux-scsi@...r.kernel.org>
Subject: Re: Drivers: scsi: FLUSH timeout

On Fri, Oct 4, 2013 at 5:18 AM, Ewan Milne <emilne@...hat.com> wrote:
> On Thu, 2013-10-03 at 13:48 -0700, Eric Seppanen wrote:
>> Do I/O timeouts and flush timeouts need to be independently adjusted?
>> If you're having trouble with slow operations, it seems likely to be
>> across the board.
>>
>> Flush timeout could be defined as 2x the read/write timeout.  Any
>> other command-specific timeouts could be scaled the same way.
>
> It seems to me that there isn't any reason to expect that the maximum
> amount of time a device might take to perform various operations are
> related by any coefficient.  And, an HBA (particularly iSCSI or FC)
> could very well have different device types connected at different
> target IDs.  So I think the flush timeout should be adjustable on
> a per-device basis.  It's probably related more to the cache size
> on the device than anything else...

There are two possible delays: how long the device might possibly
take, and how long the storage fabric might take.

On a local device, only the first matters.  But there are environments
where the second dominates (e.g. a virtual machine, where the
hypervisor's storage uses multipath with a long failover delay).

If somebody wants to set flush timeouts > 60 seconds, I would like to
know if they're trying to address a slow device or a slow fabric.  If
it's the fabric, then it's kind of silly to make them set three
different timeouts to address the same problem.

An alternate way of handling long fabric delays would be to have a
fabric_timeout that gets added to all the other timeouts... could be a
scsi_host parameter but that's probably overengineering the problem.

There are already VM vendors that tell customers to adjust the current
sysfs timeout, so the least amount of work would be to make all of the
other timeouts track that one in some way (additive or
multiplicative).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/