[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250117184934.GI1611770@frogsfrogsfrogs>
Date: Fri, 17 Jan 2025 10:49:34 -0800
From: "Darrick J. Wong" <djwong@...nel.org>
To: Christoph Hellwig <hch@....de>
Cc: Dave Chinner <david@...morbit.com>,
John Garry <john.g.garry@...cle.com>, brauner@...nel.org,
cem@...nel.org, dchinner@...hat.com, ritesh.list@...il.com,
linux-xfs@...r.kernel.org, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org, martin.petersen@...cle.com
Subject: Re: [PATCH 1/4] iomap: Lift blocksize restriction on atomic writes
On Thu, Jan 16, 2025 at 07:52:25AM +0100, Christoph Hellwig wrote:
> On Tue, Jan 14, 2025 at 03:57:26PM -0800, Darrick J. Wong wrote:
> > Ok, let's do that then. Just to be clear -- for any RWF_ATOMIC direct
> > write that's correctly aligned and targets a single mapping in the
> > correct state, we can build the untorn bio and submit it. For
> > everything else, prealloc some post EOF blocks, write them there, and
> > exchange-range them.
> >
> > Tricky questions: How do we avoid collisions between overlapping writes?
> > I guess we find a free file range at the top of the file that is long
> > enough to stage the write, and put it there? And purge it later?
> >
> > Also, does this imply that the maximum file size is less than the usual
> > 8EB?
>
> I think literally using the exchrange code for anything but an
> initial prototype is a bad idea for the above reasons. If we go
> beyond proving this is possible you'd want a version of exchrange
> where the exchange partners is not a file mapping, but a cow staging
> record.
The trouble is that the br_startoff attribute of cow staging mappings
aren't persisted on disk anywhere, which is why exchange-range can't
handle the cow fork. You could open an O_TMPFILE and swap between the
two files, though that gets expensive per-io unless you're willing to
stash that temp file somewhere.
At this point I think we should slap the usual EXPERIMENTAL warning on
atomic writes through xfs and let John land the simplest multi-fsblock
untorn write support, which only handles the corner case where all the
stars are <cough> aligned; and then make an exchange-range prototype
and/or all the other forcealign stuff.
(Lifting in smaller pieces sounds a lot better than having John carry
around an increasingly large patchset...)
--D
Powered by blists - more mailing lists