lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4F214465.9010600@redhat.com>
Date:	Thu, 26 Jan 2012 07:17:41 -0500
From:	Ric Wheeler <rwheeler@...hat.com>
To:	Dave Chinner <david@...morbit.com>
CC:	"Ted Ts'o" <tytso@....edu>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Jan Kara <jack@...e.cz>, linux-fsdevel@...r.kernel.org,
	linux-ext4@...r.kernel.org,
	Andrew Morton <akpm@...ux-foundation.org>,
	Christoph Hellwig <hch@...radead.org>,
	Al Viro <viro@...iv.linux.org.uk>,
	LKML <linux-kernel@...r.kernel.org>,
	Edward Shishkin <edward@...hat.com>
Subject: Re: [RFC PATCH 0/3] Stop clearing uptodate flag on write IO error

On 01/23/2012 07:36 PM, Dave Chinner wrote:
> On Mon, Jan 23, 2012 at 04:47:09PM -0500, Ted Ts'o wrote:
>>> The thing is, transient write errors tend to be isolated and go away
>>> when a retry occurs (think of IO timeouts when multipath failover
>>> occurs). When non-isolated IO or unrecoverable problems occur (e.g.
>>> no paths left to fail over onto), critical other metadata reads and
>>> writes will fail and shut down the filesystem, thereby terminating
>>> the "try forever" background writeback loop those delayed write
>>> buffers may be in. So the truth is that "trying forever" on write
>>> errors can handle a whole class of write IO errors very
>>> effectively....
>> So how does XFS decide whether a write should fail and shutdown the
>> file system, or just "try forever"?
> The IO dispatcher decides that. If the dispatcher has handed the IO
> off to the delayed write queue, then failed writes will be tried
> again. If the caller is catching the IO completion (e.g. sync
> writes) or attaching a completion callback (journal IO), then the
> completion context will handle the error appropriately. Journal IO
> errors tend to shutdown the filesystem on the first error, other
> contexts may handle the error, retry or shutdown the filesystem
> depending on their current state when the error occurs.
>
> Reads are even more complex, because ithe dispatch context can be
> within a transaction and the correct error handling is then
> dependent on the current state of the transaction....
>
> Cheers,
>
> Dave.

I think that having retry logic at the file system layer is really putting the 
fix in the wrong place.

Specifically, if we have multipath configured under a file system, it is up to 
the multipath logic to handle the failure (and use another path, retry, etc).  
If we see a failed IO further up the stack, it is *really* dead at that point.

Transient errors on normal drives are also rarely worth re-trying since pretty 
much all modern storage devices have firmware that will have done exhaustive 
retries on a failed write. Definitely not worth retrying forever for a normal 
device.

At one end of the spectrum, think of a box with dozens of storage devices 
attached (either via SAN or local S-ATA devices). If we are doing large, 
streaming writes, we could get a large amount of memory dirtied while writing. 
If that one device dies and we keep that memory in use for the endless retry 
loop, we have really cripple the box which still has multiple happy storage 
devices and file systems....

Ric




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ