lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250117184934.GI1611770@frogsfrogsfrogs>
Date: Fri, 17 Jan 2025 10:49:34 -0800
From: "Darrick J. Wong" <djwong@...nel.org>
To: Christoph Hellwig <hch@....de>
Cc: Dave Chinner <david@...morbit.com>,
	John Garry <john.g.garry@...cle.com>, brauner@...nel.org,
	cem@...nel.org, dchinner@...hat.com, ritesh.list@...il.com,
	linux-xfs@...r.kernel.org, linux-fsdevel@...r.kernel.org,
	linux-kernel@...r.kernel.org, martin.petersen@...cle.com
Subject: Re: [PATCH 1/4] iomap: Lift blocksize restriction on atomic writes

On Thu, Jan 16, 2025 at 07:52:25AM +0100, Christoph Hellwig wrote:
> On Tue, Jan 14, 2025 at 03:57:26PM -0800, Darrick J. Wong wrote:
> > Ok, let's do that then.  Just to be clear -- for any RWF_ATOMIC direct
> > write that's correctly aligned and targets a single mapping in the
> > correct state, we can build the untorn bio and submit it.  For
> > everything else, prealloc some post EOF blocks, write them there, and
> > exchange-range them.
> > 
> > Tricky questions: How do we avoid collisions between overlapping writes?
> > I guess we find a free file range at the top of the file that is long
> > enough to stage the write, and put it there?  And purge it later?
> > 
> > Also, does this imply that the maximum file size is less than the usual
> > 8EB?
> 
> I think literally using the exchrange code for anything but an
> initial prototype is a bad idea for the above reasons.  If we go
> beyond proving this is possible you'd want a version of exchrange
> where the exchange partners is not a file mapping, but a cow staging
> record.

The trouble is that the br_startoff attribute of cow staging mappings
aren't persisted on disk anywhere, which is why exchange-range can't
handle the cow fork.  You could open an O_TMPFILE and swap between the
two files, though that gets expensive per-io unless you're willing to
stash that temp file somewhere.

At this point I think we should slap the usual EXPERIMENTAL warning on
atomic writes through xfs and let John land the simplest multi-fsblock
untorn write support, which only handles the corner case where all the
stars are <cough> aligned; and then make an exchange-range prototype
and/or all the other forcealign stuff.

(Lifting in smaller pieces sounds a lot better than having John carry
around an increasingly large patchset...)

--D

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ