[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aRJK5LqJnrT5KAyH@dread.disaster.area>
Date: Tue, 11 Nov 2025 07:28:20 +1100
From: Dave Chinner <david@...morbit.com>
To: Florian Weimer <fweimer@...hat.com>
Cc: Christoph Hellwig <hch@....de>, Matthew Wilcox <willy@...radead.org>,
Hans Holmberg <hans.holmberg@....com>, linux-xfs@...r.kernel.org,
Carlos Maiolino <cem@...nel.org>,
"Darrick J . Wong" <djwong@...nel.org>,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
libc-alpha@...rceware.org
Subject: Re: [RFC] xfs: fake fallocate success for always CoW inodes
On Mon, Nov 10, 2025 at 06:27:41AM +0100, Florian Weimer wrote:
> * Dave Chinner:
>
> > On Sat, Nov 08, 2025 at 01:30:18PM +0100, Florian Weimer wrote:
> >> * Christoph Hellwig:
> >>
> >> > On Thu, Nov 06, 2025 at 05:31:28PM +0100, Florian Weimer wrote:
> >> >> It's been a few years, I think, and maybe we should drop the allocation
> >> >> logic from posix_fallocate in glibc? Assuming that it's implemented
> >> >> everywhere it makes sense?
> >> >
> >> > I really think it should go away. If it turns out we find cases where
> >> > it was useful we can try to implement a zeroing fallocate in the kernel
> >> > for the file system where people want it.
> >
> > This is what the shiny new FALLOC_FL_WRITE_ZEROS command is supposed
> > to provide. We don't have widepsread support in filesystems for it
> > yet, though.
> >
> >> > gfs2 for example currently
> >> > has such an implementation, and we could have somewhat generic library
> >> > version of it.
> >
> > Yup, seems like a iomap iter loop would be pretty trivial to
> > abstract from that...
> >
> >> Sorry, I remember now where this got stuck the last time.
> >>
> >> This program:
> >>
> >> #include <fcntl.h>
> >> #include <stddef.h>
> >> #include <stdio.h>
> >> #include <stdlib.h>
> >> #include <sys/mman.h>
> >>
> >> int
> >> main(void)
> >> {
> >> FILE *fp = tmpfile();
> >> if (fp == NULL)
> >> abort();
> >> int fd = fileno(fp);
> >> posix_fallocate(fd, 0, 1);
> >> char *p = mmap(NULL, 1, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
> >> *p = 1;
> >> }
> >>
> >> should not crash even if the file system does not support fallocate.
> >
> > I think that's buggy application code.
> >
> > Failing to check the return value of a library call that documents
> > EOPNOTSUPP as a valid error is a bug. IOWs, the above code *should*
> > SIGBUS on the mmap access, because it failed to verify that the file
> > extension operation actually worked.
>
> Sorry, I made the example confusing.
>
> How would the application deal with failure due to lack of fallocate
> support? It would have to do a pwrite, like posix_fallocate does to
> today, or maybe ftruncate. This is way I think removing the fallback
> from posix_fallocate completely is mostly pointless.
>
> >> I hope we can agree on that. I expect avoiding SIGBUS errors because
> >> of insufficient file size is a common use case for posix_fallocate.
> >> This use is not really an optimization, it's required to get mmap
> >> working properly.
> >>
> >> If we can get an fallocate mode that we can use as a fallback to
> >> increase the file size with a zero flag argument, we can definitely
> >
> > The fallocate() API already support that, in two different ways:
> > FALLOC_FL_ZERO_RANGE and FALLOC_FL_WRITE_ZEROS.
>
> Neither is appropriate for posix_fallocate because they are as
> destructive as the existing fallback.
You suggested we should consider "implement a zeroing fallocate",
and I've simply pointed out that it already exists. That is simply:
fallocate(WRITE_ZEROES, old_eof, new_eof - old_eof)
You didn't say that you wanted something that isn't potentially
destructive when a buggy allocation allows multiple file extension
operations to be performed concurrently.
> > You aren't going to get support for such new commands on existing
> > kernels, so userspace is still going to have to code the ftruncate()
> > fallback itself for the desired behaviour to be provided
> > consistently to applications.
> >
> > As such, I don't see any reason for the fallocate() syscall
> > providing some whacky "ftruncate() in all but name" mode.
>
> Please reconsider. If we start fixing this, we'll eventually be in a
> position where the glibc fallback code never runs.
Providing non-destructive, "truncate up only" file extension
semantics through fallocate() is exactly what
FALLOC_FL_ALLOCATE_RANGE provides.
Oh, wait, we started down this path because the "fake" success patch
didn't implement the correct ALLOCATE_RANGE semantics. i.e. the
proposed patch is buggy because it doesn't implement the externally
visible file size change semantics of a successful operation.
IOWs, there is no need for a new API here - just for filesystems to
correctly implement the file extension semantics of
FALLOC_FL_ALLOCATE_RANGE if they are going to return success without
having performed physical allocation.
IOWs, I have no problems with COW filesystems not doing
preallocation, but if they are going to return success they still
need to perform all the non-allocation parts of fallocate()
operations correctly.
Again, I don't see a need for a new API here to provide
non-destructive "truncate up only" semantics as we already have
those semantics built into the ALLOCATE_RANGE operation...
-Dave.
--
Dave Chinner
david@...morbit.com
Powered by blists - more mailing lists