linux-kernel - Re: POSIX violation by writeback error

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <0662a4c5d2e164d651a6a116d06da380f317100f.camel@redhat.com>
Date:   Tue, 25 Sep 2018 07:15:34 -0400
From:   Jeff Layton <jlayton@...hat.com>
To:     Alan Cox <gnomes@...rguk.ukuu.org.uk>
Cc:     焦晓冬 <milestonejxd@...il.com>,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
        Rogier Wolff <R.E.Wolff@...Wizard.nl>
Subject: Re: POSIX violation by writeback error

On Tue, 2018-09-25 at 00:30 +0100, Alan Cox wrote:
> > write()
> > kernel attempts to write back page and fails
> > page is marked clean and evicted from the cache
> > read()
> > 
> > Now your write is gone and there were no calls between the write and
> > read.
> > 
> > The question we still need to answer is this:
> > 
> > When we attempt to write back some data from the cache and that fails,
> > what should happen to the dirty pages?
> 
> Why do you care about the content of the pages at that point. The only
> options are to use the data (todays model), or to report that you are on
> fire.
> 

The data itself doesn't matter much. What does matter is consistent
behavior in the face of such an error. The issue (IMO) is that
currently, the result of a read that takes place after a write but
before an fsync is indeterminate.

If writeback succeeded (or hasn't been done yet) you'll get back the
data you wrote, but if there was a writeback error you may or may not.
The behavior in that case mostly depends on the whim of the filesystem
developer, and they all behave somewhat differently.

> If you are going to error you don't need to use the data so you could in
> fact compress dramatically the amount of stuff you need to save
> somewhere. You need the page information so you can realize what page
> this is, but you can point the data into oblivion somewhere because you
> are no longer going to give it to anyone (assuming you can successfully
> force unmap it from everyone once it's not locked by a DMA or similar).
> 
> In the real world though it's fairly unusual to just lose a bit of I/O.
> Flash devices in particular have a nasty tendancy to simply go *poof* and
> the first you know about an I/O error is the last data the drive ever
> gives you short of jtag. NFS is an exception and NFS soft timeouts are
> nasty.
> 

Linux has dozens of filesystems and they all behave differently in this
regard. A catastrophic failure (paradoxically) makes things simpler for
the fs developer, but even on local filesystems isolated errors can
occur. It's also not just NFS -- what mostly started me down this road
was working on ENOSPC handling for CephFS.

I think it'd be good to at least establish a "gold standard" for what
filesystems ought to do in this situation. We might not be able to
achieve that in all cases, but we could then document the exceptions.
--
Jeff Layton <jlayton@...hat.com>