[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aFzFR6zD7X1_9bWj@dread.disaster.area>
Date: Thu, 26 Jun 2025 13:57:59 +1000
From: Dave Chinner <david@...morbit.com>
To: Yafang Shao <laoar.shao@...il.com>
Cc: Jeff Layton <jlayton@...nel.org>, Christoph Hellwig <hch@...radead.org>,
djwong@...nel.org, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-xfs@...r.kernel.org,
yc1082463@...il.com
Subject: Re: [PATCH] xfs: report a writeback error on a read() call
On Thu, Jun 26, 2025 at 10:41:47AM +0800, Yafang Shao wrote:
> On Wed, Jun 25, 2025 at 10:06 PM Jeff Layton <jlayton@...nel.org> wrote:
> >
> > On Wed, 2025-06-25 at 04:56 -0700, Christoph Hellwig wrote:
> > > On Wed, Jun 25, 2025 at 07:49:31AM -0400, Jeff Layton wrote:
> > > > Another idea: add a new generic ioctl() that checks for writeback
> > > > errors without syncing anything. That would be fairly simple to do and
> > > > sounds like it would be useful, but I'd want to hear a better
> > > > description of the use-case before we did anything like that.
>
> As you mentioned earlier, calling fsync()/fdatasync() after every
> write() blocks the thread, degrading performance—especially on HDDs.
> However, this isn’t the main issue in practice.
> The real problem is that users typically don’t understand "writeback
> errors". If you warn them, "You should call fsync() because writeback
> errors might occur," their response will likely be: "What the hell is
> a writeback error?"
>
> For example, our users (a big data platform) demanded that we
> immediately shut down the filesystem upon writeback errors. These
> users are algorithm analysts who write Python/Java UDFs for custom
> logic—often involving temporary disk writes followed by reads to pass
> data downstream. Yet, most have no idea how these underlying processes
> work.
And that's exactly why XFS originally never threw away dirty data on
writeback errors. Because scientists and data analysts that wrote
programs to chew through large amounts of data didn't care about
persistence of their data mid-processing. They just wanted what they
wrote to be there the next time the processing pipeline read it.
> > > That's what I mean with my above proposal, except that I though of an
> > > fcntl or syscall and not an ioctl.
> >
> > Yeah, a fcntl() would be reasonable, I think.
> >
> > For a syscall, I guess we could add an fsync2() which just adds a flags
> > field. Then add a FSYNC_JUSTCHECK flag that makes it just check for
> > errors and return.
> >
> > Personally, I like the fcntl() idea better for this, but maybe we have
> > other uses for a fsync2().
>
> What do you expect users to do with this new fcntl() or fsync2()? Call
> fsync2() after every write()? That would still require massive
> application refactoring.
<sigh>
We already have a user interface that provides exactly the desired
functionality.
$ man sync_file_range
....
Some details
SYNC_FILE_RANGE_WAIT_BEFORE and SYNC_FILE_RANGE_WAIT_AFTER
will detect any I/O errors or ENOSPC conditions and will
return these to the caller.
....
IOWs, checking for a past writeback IO error is as simple as:
if (sync_file_range(fd, 0, 0, SYNC_FILE_RANGE_WAIT_BEFORE) < 0) {
/* An unreported writeback error was pending on the file */
wb_err = -errno;
......
}
This does not cause new IO to be issued, it only blocks on writeback
that is currently in progress, and it has no data integrity
requirements at all. If the writeback has already been done, all it
will do is sweep residual errors out to userspace.....
-Dave.
--
Dave Chinner
david@...morbit.com
Powered by blists - more mailing lists