lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aRJK5LqJnrT5KAyH@dread.disaster.area>
Date: Tue, 11 Nov 2025 07:28:20 +1100
From: Dave Chinner <david@...morbit.com>
To: Florian Weimer <fweimer@...hat.com>
Cc: Christoph Hellwig <hch@....de>, Matthew Wilcox <willy@...radead.org>,
	Hans Holmberg <hans.holmberg@....com>, linux-xfs@...r.kernel.org,
	Carlos Maiolino <cem@...nel.org>,
	"Darrick J . Wong" <djwong@...nel.org>,
	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
	libc-alpha@...rceware.org
Subject: Re: [RFC] xfs: fake fallocate success for always CoW inodes

On Mon, Nov 10, 2025 at 06:27:41AM +0100, Florian Weimer wrote:
> * Dave Chinner:
> 
> > On Sat, Nov 08, 2025 at 01:30:18PM +0100, Florian Weimer wrote:
> >> * Christoph Hellwig:
> >> 
> >> > On Thu, Nov 06, 2025 at 05:31:28PM +0100, Florian Weimer wrote:
> >> >> It's been a few years, I think, and maybe we should drop the allocation
> >> >> logic from posix_fallocate in glibc?  Assuming that it's implemented
> >> >> everywhere it makes sense?
> >> >
> >> > I really think it should go away.  If it turns out we find cases where
> >> > it was useful we can try to implement a zeroing fallocate in the kernel
> >> > for the file system where people want it.
> >
> > This is what the shiny new FALLOC_FL_WRITE_ZEROS command is supposed
> > to provide. We don't have widepsread support in filesystems for it
> > yet, though.
> >
> >> > gfs2 for example currently
> >> > has such an implementation, and we could have somewhat generic library
> >> > version of it.
> >
> > Yup, seems like a iomap iter loop would be pretty trivial to
> > abstract from that...
> >
> >> Sorry, I remember now where this got stuck the last time.
> >> 
> >> This program:
> >> 
> >> #include <fcntl.h>
> >> #include <stddef.h>
> >> #include <stdio.h>
> >> #include <stdlib.h>
> >> #include <sys/mman.h>
> >> 
> >> int
> >> main(void)
> >> {
> >>   FILE *fp = tmpfile();
> >>   if (fp == NULL)
> >>     abort();
> >>   int fd = fileno(fp);
> >>   posix_fallocate(fd, 0, 1);
> >>   char *p = mmap(NULL, 1, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> >>   *p = 1;
> >> }
> >> 
> >> should not crash even if the file system does not support fallocate.
> >
> > I think that's buggy application code.
> >
> > Failing to check the return value of a library call that documents
> > EOPNOTSUPP as a valid error is a bug. IOWs, the above code *should*
> > SIGBUS on the mmap access, because it failed to verify that the file
> > extension operation actually worked.
> 
> Sorry, I made the example confusing.
> 
> How would the application deal with failure due to lack of fallocate
> support?  It would have to do a pwrite, like posix_fallocate does to
> today, or maybe ftruncate.  This is way I think removing the fallback
> from posix_fallocate completely is mostly pointless.
> 
> >> I hope we can agree on that.  I expect avoiding SIGBUS errors because
> >> of insufficient file size is a common use case for posix_fallocate.
> >> This use is not really an optimization, it's required to get mmap
> >> working properly.
> >> 
> >> If we can get an fallocate mode that we can use as a fallback to
> >> increase the file size with a zero flag argument, we can definitely
> >
> > The fallocate() API already support that, in two different ways:
> > FALLOC_FL_ZERO_RANGE and FALLOC_FL_WRITE_ZEROS.
> 
> Neither is appropriate for posix_fallocate because they are as
> destructive as the existing fallback.

You suggested we should consider "implement a zeroing fallocate",
and I've simply pointed out that it already exists. That is simply:

	fallocate(WRITE_ZEROES, old_eof, new_eof - old_eof)

You didn't say that you wanted something that isn't potentially
destructive when a buggy allocation allows multiple file extension
operations to be performed concurrently. 

> > You aren't going to get support for such new commands on existing
> > kernels, so userspace is still going to have to code the ftruncate()
> > fallback itself for the desired behaviour to be provided
> > consistently to applications.
> >
> > As such, I don't see any reason for the fallocate() syscall
> > providing some whacky "ftruncate() in all but name" mode.
> 
> Please reconsider.  If we start fixing this, we'll eventually be in a
> position where the glibc fallback code never runs.

Providing non-destructive, "truncate up only" file extension
semantics through fallocate() is exactly what
FALLOC_FL_ALLOCATE_RANGE provides.

Oh, wait, we started down this path because the "fake" success patch
didn't implement the correct ALLOCATE_RANGE semantics. i.e. the
proposed patch is buggy because it doesn't implement the externally
visible file size change semantics of a successful operation.

IOWs, there is no need for a new API here - just for filesystems to
correctly implement the file extension semantics of
FALLOC_FL_ALLOCATE_RANGE if they are going to return success without
having performed physical allocation.

IOWs, I have no problems with COW filesystems not doing
preallocation, but if they are going to return success they still
need to perform all the non-allocation parts of fallocate()
operations correctly.

Again, I don't see a need for a new API here to provide
non-destructive "truncate up only" semantics as we already have
those semantics built into the ALLOCATE_RANGE operation...

-Dave.
-- 
Dave Chinner
david@...morbit.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ