linux-ext4 - Re: [PATCH v5 00/17] fs: introduce new writeback error reporting and convert ext2 and ext4 to use it

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1496268070.2984.15.camel@redhat.com>
Date:   Wed, 31 May 2017 18:01:10 -0400
From:   Jeff Layton <jlayton@...hat.com>
To:     Andrew Morton <akpm@...ux-foundation.org>
Cc:     Al Viro <viro@...IV.linux.org.uk>, Jan Kara <jack@...e.cz>,
        tytso@....edu, axboe@...nel.dk, mawilcox@...rosoft.com,
        ross.zwisler@...ux.intel.com, corbet@....net,
        linux-ext4@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-block@...r.kernel.org,
        linux-doc@...r.kernel.org
Subject: Re: [PATCH v5 00/17] fs: introduce new writeback error reporting
 and convert ext2 and ext4 to use it

On Wed, 2017-05-31 at 14:37 -0700, Andrew Morton wrote:
> On Wed, 31 May 2017 17:31:49 -0400 Jeff Layton <jlayton@...hat.com> wrote:
> 
> > On Wed, 2017-05-31 at 13:27 -0700, Andrew Morton wrote:
> > > On Wed, 31 May 2017 08:45:23 -0400 Jeff Layton <jlayton@...hat.com> wrote:
> > > 
> > > > This is v5 of the patchset to improve how we're tracking and reporting
> > > > errors that occur during pagecache writeback.
> > > 
> > > I'm curious to know how you've been testing this?
> > >  Is that testing
> > > strong enough for us to be confident that all nature of I/O errors
> > > will be reported to userspace?
> > > 
> > 
> > That's a tall order. This is a difficult thing to test as these sorts of
> > errors are pretty rare by nature.
> > 
> > I have an xfstest that I posted just after this set that demonstrates
> > that it works correctly, at least on ext2/3/4 when run by the ext4
> > driver (ext2 legacy driver reports too many errors currently). I had
> > btrfs and xfs working on that test too in an earlier incarnation of this
> > set, so I think we can fix this in them as well without too much
> > difficulty.
> > 
> > I'm happy to run other tests if someone wants to suggest them.
> > 
> > Now, all that said, I don't think this will make things any worse than
> > they are today as far as reporting errors properly to userland goes.
> > It's rather easy for an incidental synchronous writeback request from an
> > internal caller to clear the AS_* flags today. This will at least ensure
> > that we're reporting errors since a well-defined point in time when you
> > call fsync.
> 
> Were you using error injection of some form?  If so, how was that all
> set up?
> 

Yes, it uses dm-error for fault injection.

The test basically does:

1) set up a dm-error device in a working configuration

2) build a scratch filesystem on it, with the log on a different device
in some fashion so metadata writeback will still succeed.

3) open the same file several times

4) flip dm-error device to non-working mode

5) write to each fd

6) fsync each fd

...do you get back an error on each fsync?

It then does a bit more to make sure they're cleared afterward as you'd
expect. That works for most block device based filesystems. I also have
a second xfstest that opens a block device and does the same basic
thing. That also works correctly with this patch series.

I still need to come up with a way to simulate errors on other fs'
though. We may need to plumb in some kernel-level fault injection on
some fs' to do that correctly. Suggestions welcome there.

With this series though, the idea is to convert one filesystem at a
time, so I think that should help mitigate some of the risk.

-- 
Jeff Layton <jlayton@...hat.com>