linux-kernel - Re: [RFC PATCH 0/4] fs: introduce new writeback error tracking infrastructure and convert ext4 to use it

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1491306070.20445.2.camel@redhat.com>
Date:   Tue, 04 Apr 2017 07:41:10 -0400
From:   Jeff Layton <jlayton@...hat.com>
To:     NeilBrown <neilb@...e.com>, Matthew Wilcox <willy@...radead.org>
Cc:     linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-ext4@...r.kernel.org, akpm@...ux-foundation.org,
        tytso@....edu, jack@...e.cz
Subject: Re: [RFC PATCH 0/4] fs: introduce new writeback error tracking
 infrastructure and convert ext4 to use it

On Tue, 2017-04-04 at 13:03 +1000, NeilBrown wrote:
> On Mon, Apr 03 2017, Jeff Layton wrote:
> 
> > On Mon, 2017-04-03 at 12:16 -0700, Matthew Wilcox wrote:
> > > On Mon, Apr 03, 2017 at 01:47:37PM -0400, Jeff Layton wrote:
> > > > > I wonder whether it's even worth supporting both EIO and ENOSPC for a
> > > > > writeback problem.  If I understand correctly, at the time of write(),
> > > > > filesystems check to see if they have enough blocks to satisfy the
> > > > > request, so ENOSPC only comes up in the writeback context for thinly
> > > > > provisioned devices.
> > > > 
> > > > No, ENOSPC on writeback can certainly happen with network filesystems.
> > > > NFS and CIFS have no way to reserve space. You wouldn't want to have to
> > > > do an extra RPC on every buffered write. :)
> > > 
> > > Aaah, yes, network filesystems.  I would indeed not want to do an extra
> > > RPC on every write to a hole (it's a hole vs non-hole question, rather
> > > than a buffered/unbuffered question ... unless you're WAFLing and not
> > > reclaiming quickly enough, I suppose).
> > > 
> > > So, OK, that makes sense, we should keep allowing filesystems to report
> > > ENOSPC as a writeback error.  But I think much of the argument below
> > > still holds, and we should continue to have a prior EIO to be reported
> > > over a new ENOSPC (even if the program has already consumed the EIO).
> > > 
> > 
> > I'm fine with that (though I'd like Neil's thoughts before we decide
> > anything) there.
> 
> I'd like there be a well defined time when old errors were forgotten.
> It does make sense for EIO to persist even if ENOSPC or EDQUOT is
> received, but not forever.
> Clearing the remembered errors when put_write_access() causes
> i_writecount to reach zero is one option (as suggested), but I'm not
> sure I'm happy with it.
> 
> Local filesystems, or network filesystems which receive strong write
> delegations, should only ever return EIO to fsync.  We should
> concentrate on them first, I think.  As there is only one possible
> error, the seq counter is sufficient to "clear" it once it has been
> reported to fsync() (or write()?).
> 
> Other network filesystems could return a whole host of errors: ENOSPC
> EDQUOT ESTALE EPERM EFBIG ...
> Do we want to limit exactly which errors are allowed in generic code, or
> do we just support EIO generically and expect the filesystem to sort out
> the details for anything else?
> 
> One possible approach a filesystem could take is just to allow a single
> async writeback error.  After that error, all subsequent write()
> system calls become synchronous. As write() or fsync() is called on each
> file descriptor (which could possibly have sent the write which caused
> the error), an error is returned and that fact is counted.  Once we have
> returned as many errors as there are open file descriptors
> (i_writecount?), and have seen a successful write, the filesystem
> forgets all recorded errors and switches back to async writes (for that
> inode).   NFS does this switch-to-sync-on-error.  See nfs_need_check_write().
> 
> The "which could possibly have sent the write which caused the error" is
> an explicit reference to NFS.  NFS doesn't use the AS_EIO/AS_ENOSPC
> flags to return async errors.  It allocates an nfs_open_context for each
> user who opens a given inode, and stores an error in there.  Each dirty
> pages is associated with one of these, so errors a sure to go to the
> correct user, though not necessarily the correct fd at present.
> 
> When we specify the new behaviour we should be careful to be as vague as
> possible while still saying what we need.  This allows filesystems some
> flexibility.
> 
>   If an error happens during writeback, the next write() or fsync() (or
>   ....) on the file descriptor to which data was written will return -1
>   with errno set to EIO or some other relevant error.  Other file
>   descriptors open on the same file may receive EIO or some other error
>   on a subsequent appropriate system call.
>   It should not be assumed that close() will return an error.  fsync()
>   must be called before close() if writeback errors are important to the
>   application.
> 
> 

A lot in here... :)

While I like the NFS method of switching to sync I/O on error (and
indeed, I'm copying that in the Ceph ENOSPC patches I have), I'm not
sure it would really help anything here. The main reason NFS does that
is to prevent you from dirtying tons of pages that can't be cleaned. 

While that is a laudable goal, it's not really the problem I'm
interested in solving here. My goal is simply to ensure that you see a
writeback error on fsync if one occurred since the last fsync.

I think it just comes down to the fact that I'm not convinced that it
really matters much _what_ error gets reported, as long as you get one.
As you've mentioned in earlier discussions, most programs just treat it
as a fatal error anyway. As long as that error is representative of
some error that occurred during writeback, do we really care what it
was?

Suppose we have a bunch of dirty pages on an inode, get an EIO error
and then ENOSPC on a different write (maybe issued in parallel). We
send the ENOSPC error back to the application on an fsync (since it
came in last). Application then cleans out some junk from the fs and
then reissues the writes. They fail again and then he gets EIO from the
fsync and aborts.

Ok, so we might not have had to clean out the files and reissue the
writes there since we were going to give up anyway. Is it worth going
to extra lengths to avoid that there, given that we're in an error
condition anyway?

I'm just trying to understand why it matters at all what error you get
back when there multiple problems. They all seem equally valid to me in
that situation.

-- 
Jeff Layton <jlayton@...hat.com>