lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 13 Apr 2018 07:48:07 -0700
From:   Matthew Wilcox <>
To:     Andres Freund <>
Cc:     "Theodore Y. Ts'o" <>,,,
        "Joshua D. Drake" <>,
        Andreas Dilger <>
Subject: Re: fsync() errors is unsafe and risks data loss

On Tue, Apr 10, 2018 at 03:07:26PM -0700, Andres Freund wrote:
> I don't think that's the full issue. We can deal with the fact that an
> fsync failure is edge-triggered if there's a guarantee that every
> process doing so would get it.  The fact that one needs to have an FD
> open from before any failing writes occurred to get a failure, *THAT'S*
> the big issue.
> Beyond postgres, it's a pretty common approach to do work on a lot of
> files without fsyncing, then iterate over the directory fsync
> everything, and *then* assume you're safe. But unless I severaly
> misunderstand something that'd only be safe if you kept an FD for every
> file open, which isn't realistic for pretty obvious reasons.

While accepting that under memory pressure we can still evict the error
indicators, we can do a better job than we do today.  The current design
of error reporting says that all errors which occurred before you opened
the file descriptor are of no interest to you.  I don't think that's
necessarily true, and it's actually a change of behaviour from before
the errseq work.

Consider Stupid Task A which calls open(), write(), close(), and Smart
Task B which calls open(), write(), fsync(), close() operating on the
same file.  If A goes entirely before B and encounters an error, before
errseq_t, B would see the error from A's write.

If A and B overlap, even a little bit, then B still gets to see A's
error today.  But if writeback happens for A's write before B opens the
file then B will never see the error.

B doesn't want to see historical errors that a previous invocation of
B has already handled, but we know whether *anyone* has seen the error
or not.  So here's a patch which restores the historical behaviour of
seeing old unhandled errors on a fresh file descriptor:

Signed-off-by: Matthew Wilcox <>

diff --git a/lib/errseq.c b/lib/errseq.c
index df782418b333..093f1fba4ee0 100644
--- a/lib/errseq.c
+++ b/lib/errseq.c
@@ -119,19 +119,11 @@ EXPORT_SYMBOL(errseq_set);
 errseq_t errseq_sample(errseq_t *eseq)
 	errseq_t old = READ_ONCE(*eseq);
-	errseq_t new = old;
-	/*
-	 * For the common case of no errors ever having been set, we can skip
-	 * marking the SEEN bit. Once an error has been set, the value will
-	 * never go back to zero.
-	 */
-	if (old != 0) {
-		new |= ERRSEQ_SEEN;
-		if (old != new)
-			cmpxchg(eseq, old, new);
-	}
-	return new;
+	/* If nobody has seen this error yet, then we can be the first. */
+	if (!(old & ERRSEQ_SEEN))
+		old = 0;
+	return old;

Powered by blists - more mailing lists