linux-ext4 - Re: [RFC PATCH 0/3] Stop clearing uptodate flag on write IO error

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20120123030422.GE15102@dastard>
Date:	Mon, 23 Jan 2012 14:04:22 +1100
From:	Dave Chinner <david@...morbit.com>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Jan Kara <jack@...e.cz>, linux-fsdevel@...r.kernel.org,
	linux-ext4@...r.kernel.org,
	Andrew Morton <akpm@...ux-foundation.org>,
	Christoph Hellwig <hch@...radead.org>,
	Al Viro <viro@...iv.linux.org.uk>,
	LKML <linux-kernel@...r.kernel.org>,
	Edward Shishkin <edward@...hat.com>
Subject: Re: [RFC PATCH 0/3] Stop clearing uptodate flag on write IO error

[ LCA delayed responding to this... ]

On Mon, Jan 16, 2012 at 04:59:41PM -0800, Linus Torvalds wrote:
> On Mon, Jan 16, 2012 at 4:36 PM, Dave Chinner <david@...morbit.com> wrote:
> >
> > Jan is right, Linus. His definition of what up-to-date means for
> > dirty buffers is correct, especially in the case of write errors.
> 
> It's not a dirty buffer any more.

Yes it is. The write has not completed, so by definition the buffer
is not clean.

> Go look. We've long since cleared the dirty bit.

Sure, but the buffer contents are dirty until the IO completes
successfully and what is on disk matches the contents of the buffer
in memory. It doesn't magically become clean when we clear the dirty
bit. We only clear the dirty bit before submitting the IO to stop
multiple callers from trying to submit it for write at the same
time. IOWs, the buffer dirty bit doesn't really track the dirty
state of the buffer correctly.

> So stop spouting garbage.
>
> My argument is simple: the contents ARE NOT CORRECT ENOUGH to be
> called "up-to-date and clean".

I didn't say it was clean - I said a buffer that failed a write is
not invalid but was still up-to-date and the error handling should
treat it that way. I thought it was obvious that this meant we have
to redirty the buffer at the same time we mark it with an IO error
so that it's state was correct....

> And I outlined the two choices:
> 
>  - mark it dirty and continue trying to write it out forever
> 
>  - invalidate it.
> 
> Anything else is crazy talk.

I can only assume that you didn't read what I said about how
different filesystems can (and do) handle write errors differently.
Indeed, even within a filesystem there can be different error
handling methods for different types of write IO errors (e.g.
transient vs unrecoverable).  Hence there are any number of valid
error handling strategies that can be added to the above list. One
size does not fit all...

> And marking it dirty forever isn't really
> an option. So..

I guess you don't realise that Linux already has a filesystem that
uses this technique. It's called XFS.  ;)

FYI, XFS has used the redirtying method to retry failed delayed
write buffer IO since day zero (i.e. 1993). EFS (XFS's predecessor
on Irix) was doing this long before XFS came along so this technique
for handling certain types of transient write IO errors has been
used in production filesystems for somewhere around 25 years.

The thing is, transient write errors tend to be isolated and go away
when a retry occurs (think of IO timeouts when multipath failover
occurs). When non-isolated IO or unrecoverable problems occur (e.g.
no paths left to fail over onto), critical other metadata reads and
writes will fail and shut down the filesystem, thereby terminating
the "try forever" background writeback loop those delayed write
buffers may be in. So the truth is that "trying forever" on write
errors can handle a whole class of write IO errors very
effectively....

Cheers,

Dave.
-- 
Dave Chinner
david@...morbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html