lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aRJaLn72i4yh1mkp@dread.disaster.area>
Date: Tue, 11 Nov 2025 08:33:34 +1100
From: Dave Chinner <david@...morbit.com>
To: Christoph Hellwig <hch@....de>
Cc: Florian Weimer <fw@...eb.enyo.de>, Florian Weimer <fweimer@...hat.com>,
	Matthew Wilcox <willy@...radead.org>,
	Hans Holmberg <hans.holmberg@....com>, linux-xfs@...r.kernel.org,
	Carlos Maiolino <cem@...nel.org>,
	"Darrick J . Wong" <djwong@...nel.org>,
	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
	libc-alpha@...rceware.org
Subject: Re: [RFC] xfs: fake fallocate success for always CoW inodes

On Mon, Nov 10, 2025 at 10:37:01AM +0100, Christoph Hellwig wrote:
> On Mon, Nov 10, 2025 at 09:15:50AM +1100, Dave Chinner wrote:
> > On Sat, Nov 08, 2025 at 01:30:18PM +0100, Florian Weimer wrote:
> > > * Christoph Hellwig:
> > > 
> > > > On Thu, Nov 06, 2025 at 05:31:28PM +0100, Florian Weimer wrote:
> > > >> It's been a few years, I think, and maybe we should drop the allocation
> > > >> logic from posix_fallocate in glibc?  Assuming that it's implemented
> > > >> everywhere it makes sense?
> > > >
> > > > I really think it should go away.  If it turns out we find cases where
> > > > it was useful we can try to implement a zeroing fallocate in the kernel
> > > > for the file system where people want it.
> > 
> > This is what the shiny new FALLOC_FL_WRITE_ZEROS command is supposed
> > to provide. We don't have widepsread support in filesystems for it
> > yet, though.
> 
> Not really.  FALLOC_FL_WRITE_ZEROS does hardware-offloaded zeroing.

That is not required functionality - it is an implementation
optimisation.

WRITE_ZEROES requires that the subsequent write must not need to
perform filesystem metadata updates to guarantee data integrity.
How the filesystem implements that is up to the filesystem....

> I.e., it does the same think as the just write zeroes thing as the
> current glibc fallback and is just as bad for the same reasons.

No, it is not like the current glibc posix_fallocate() fallback.
That is a compatibility slow-path, not an IO path performance
optimisation.

i.e. WRITE_ZEROES is for applications that overwrite in place and
are very sensitive to IO latency.  The zeroing is done
in a context that is not performance sensitive, and it results in
much lower long tail latencies in the performance sensitive IO
paths.

WRITE_ZEROES is a more efficient way of running
FALLOC_FL_ALLOC_RANGE and then writing zeroes to convert the range
from unwritten to written extents because it allows ithe kernel to
use hardware offloads if they are available.

Applications that need pure overwrite behaviour are not going to be
using COW files or storage that requires always-COW IO paths in the
filesystems (e.g. on zoned storage hardware).

Hence we just don't care that:

> It
> also is something that doesn't make any sense to support in a write
> out of place file system.

... COW files cannot support WRITE_ZEROES functionality because
optimisations for overwrite-in-place aren't valid for COW-based
IO...

> > Failing to check the return value of a library call that documents
> > EOPNOTSUPP as a valid error is a bug. IOWs, the above code *should*
> > SIGBUS on the mmap access, because it failed to verify that the file
> > extension operation actually worked.
> > 
> > I mean, if this was "ftruncate(1); mmap(); *p =1" and ftruncate()
> > failed and so SIGBUS was delivered, there would be no doubt that
> > this is an application bug. Why is should we treat errors returned
> > by fallocate() and/or posix_fallocate() any different here?
> 
> I think what Florian wants (although I might be misunderstanding him)
> is an interface that will increase the file size up to the passed in
> size, but never reduce it and lose data.

Ah, that's not a "zeroing fallocate()" like was suggested. These are
the existing FALLOC_FL_ALLOCATE_RANGE file extension semantics.

AFAICT, this is exactly what the proposed patch implements - it
short circuits the bit we can't guarantee (ENOSPC prevention via
preallocation) but retains all the other aspects (non-destructive
truncate up) when it returns success.

I don't see how a glibc posix_fallocate() fallback that does a
non-desctructive truncate up though some new interface is any better
than just having the filesystem implement ALLOCATE_RANGE without the
ENOSPC guarantees in the first place?

> > > If we can get an fallocate mode that we can use as a fallback to
> > > increase the file size with a zero flag argument, we can definitely
> > 
> > The fallocate() API already support that, in two different ways:
> > FALLOC_FL_ZERO_RANGE and FALLOC_FL_WRITE_ZEROS. 
> 
> They are both quite different as they both zero the entire passed in
> range, even if it already contains data, which is completely different
> from the posix_fallocate or fallocate FALLOC_FL_ALLOCATE_RANGE semantics
> that leave any existing data intact.

Yes. However:

	fallocate(fd, FALLOC_FL_WRITE_ZEROES, old_eof, new_eof - old_eof);

is exactly the "zeroing truncate up" operation that was being
suggested. It will not overwrite any existing data, except if the
application is racing other file extension operations with this one.
In which case, the application is buggy, not the fallocate() code.

-Dave.
-- 
Dave Chinner
david@...morbit.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ