[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180413144807.GB24379@bombadil.infradead.org>
Date: Fri, 13 Apr 2018 07:48:07 -0700
From: Matthew Wilcox <willy@...radead.org>
To: Andres Freund <andres@...razel.de>
Cc: "Theodore Y. Ts'o" <tytso@....edu>, linux-ext4@...r.kernel.org,
linux-fsdevel@...r.kernel.org,
"Joshua D. Drake" <jd@...mandprompt.com>,
Andreas Dilger <adilger@...ger.ca>
Subject: Re: fsync() errors is unsafe and risks data loss
On Tue, Apr 10, 2018 at 03:07:26PM -0700, Andres Freund wrote:
> I don't think that's the full issue. We can deal with the fact that an
> fsync failure is edge-triggered if there's a guarantee that every
> process doing so would get it. The fact that one needs to have an FD
> open from before any failing writes occurred to get a failure, *THAT'S*
> the big issue.
>
> Beyond postgres, it's a pretty common approach to do work on a lot of
> files without fsyncing, then iterate over the directory fsync
> everything, and *then* assume you're safe. But unless I severaly
> misunderstand something that'd only be safe if you kept an FD for every
> file open, which isn't realistic for pretty obvious reasons.
While accepting that under memory pressure we can still evict the error
indicators, we can do a better job than we do today. The current design
of error reporting says that all errors which occurred before you opened
the file descriptor are of no interest to you. I don't think that's
necessarily true, and it's actually a change of behaviour from before
the errseq work.
Consider Stupid Task A which calls open(), write(), close(), and Smart
Task B which calls open(), write(), fsync(), close() operating on the
same file. If A goes entirely before B and encounters an error, before
errseq_t, B would see the error from A's write.
If A and B overlap, even a little bit, then B still gets to see A's
error today. But if writeback happens for A's write before B opens the
file then B will never see the error.
B doesn't want to see historical errors that a previous invocation of
B has already handled, but we know whether *anyone* has seen the error
or not. So here's a patch which restores the historical behaviour of
seeing old unhandled errors on a fresh file descriptor:
Signed-off-by: Matthew Wilcox <mawilcox@...rosoft.com>
diff --git a/lib/errseq.c b/lib/errseq.c
index df782418b333..093f1fba4ee0 100644
--- a/lib/errseq.c
+++ b/lib/errseq.c
@@ -119,19 +119,11 @@ EXPORT_SYMBOL(errseq_set);
errseq_t errseq_sample(errseq_t *eseq)
{
errseq_t old = READ_ONCE(*eseq);
- errseq_t new = old;
- /*
- * For the common case of no errors ever having been set, we can skip
- * marking the SEEN bit. Once an error has been set, the value will
- * never go back to zero.
- */
- if (old != 0) {
- new |= ERRSEQ_SEEN;
- if (old != new)
- cmpxchg(eseq, old, new);
- }
- return new;
+ /* If nobody has seen this error yet, then we can be the first. */
+ if (!(old & ERRSEQ_SEEN))
+ old = 0;
+ return old;
}
EXPORT_SYMBOL(errseq_sample);
Powered by blists - more mailing lists