linux-ext4 - Re: fsync() errors is unsafe and risks data loss

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20180413144807.GB24379@bombadil.infradead.org>
Date:   Fri, 13 Apr 2018 07:48:07 -0700
From:   Matthew Wilcox <willy@...radead.org>
To:     Andres Freund <andres@...razel.de>
Cc:     "Theodore Y. Ts'o" <tytso@....edu>, linux-ext4@...r.kernel.org,
        linux-fsdevel@...r.kernel.org,
        "Joshua D. Drake" <jd@...mandprompt.com>,
        Andreas Dilger <adilger@...ger.ca>
Subject: Re: fsync() errors is unsafe and risks data loss

On Tue, Apr 10, 2018 at 03:07:26PM -0700, Andres Freund wrote:
> I don't think that's the full issue. We can deal with the fact that an
> fsync failure is edge-triggered if there's a guarantee that every
> process doing so would get it.  The fact that one needs to have an FD
> open from before any failing writes occurred to get a failure, *THAT'S*
> the big issue.
> 
> Beyond postgres, it's a pretty common approach to do work on a lot of
> files without fsyncing, then iterate over the directory fsync
> everything, and *then* assume you're safe. But unless I severaly
> misunderstand something that'd only be safe if you kept an FD for every
> file open, which isn't realistic for pretty obvious reasons.

While accepting that under memory pressure we can still evict the error
indicators, we can do a better job than we do today.  The current design
of error reporting says that all errors which occurred before you opened
the file descriptor are of no interest to you.  I don't think that's
necessarily true, and it's actually a change of behaviour from before
the errseq work.

Consider Stupid Task A which calls open(), write(), close(), and Smart
Task B which calls open(), write(), fsync(), close() operating on the
same file.  If A goes entirely before B and encounters an error, before
errseq_t, B would see the error from A's write.

If A and B overlap, even a little bit, then B still gets to see A's
error today.  But if writeback happens for A's write before B opens the
file then B will never see the error.

B doesn't want to see historical errors that a previous invocation of
B has already handled, but we know whether *anyone* has seen the error
or not.  So here's a patch which restores the historical behaviour of
seeing old unhandled errors on a fresh file descriptor:

Signed-off-by: Matthew Wilcox <mawilcox@...rosoft.com>

diff --git a/lib/errseq.c b/lib/errseq.c
index df782418b333..093f1fba4ee0 100644
--- a/lib/errseq.c
+++ b/lib/errseq.c
@@ -119,19 +119,11 @@ EXPORT_SYMBOL(errseq_set);
 errseq_t errseq_sample(errseq_t *eseq)
 {
 	errseq_t old = READ_ONCE(*eseq);
-	errseq_t new = old;

-	/*
-	 * For the common case of no errors ever having been set, we can skip
-	 * marking the SEEN bit. Once an error has been set, the value will
-	 * never go back to zero.
-	 */
-	if (old != 0) {
-		new |= ERRSEQ_SEEN;
-		if (old != new)
-			cmpxchg(eseq, old, new);
-	}
-	return new;
+	/* If nobody has seen this error yet, then we can be the first. */
+	if (!(old & ERRSEQ_SEEN))
+		old = 0;
+	return old;
 }
 EXPORT_SYMBOL(errseq_sample);