[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <lhuseem1mpe.fsf@oldenburg.str.redhat.com>
Date: Mon, 10 Nov 2025 06:27:41 +0100
From: Florian Weimer <fweimer@...hat.com>
To: Dave Chinner <david@...morbit.com>
Cc: Christoph Hellwig <hch@....de>, Matthew Wilcox <willy@...radead.org>,
Hans Holmberg <hans.holmberg@....com>, linux-xfs@...r.kernel.org,
Carlos Maiolino <cem@...nel.org>, "Darrick J . Wong"
<djwong@...nel.org>, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org, libc-alpha@...rceware.org
Subject: Re: [RFC] xfs: fake fallocate success for always CoW inodes
* Dave Chinner:
> On Sat, Nov 08, 2025 at 01:30:18PM +0100, Florian Weimer wrote:
>> * Christoph Hellwig:
>>
>> > On Thu, Nov 06, 2025 at 05:31:28PM +0100, Florian Weimer wrote:
>> >> It's been a few years, I think, and maybe we should drop the allocation
>> >> logic from posix_fallocate in glibc? Assuming that it's implemented
>> >> everywhere it makes sense?
>> >
>> > I really think it should go away. If it turns out we find cases where
>> > it was useful we can try to implement a zeroing fallocate in the kernel
>> > for the file system where people want it.
>
> This is what the shiny new FALLOC_FL_WRITE_ZEROS command is supposed
> to provide. We don't have widepsread support in filesystems for it
> yet, though.
>
>> > gfs2 for example currently
>> > has such an implementation, and we could have somewhat generic library
>> > version of it.
>
> Yup, seems like a iomap iter loop would be pretty trivial to
> abstract from that...
>
>> Sorry, I remember now where this got stuck the last time.
>>
>> This program:
>>
>> #include <fcntl.h>
>> #include <stddef.h>
>> #include <stdio.h>
>> #include <stdlib.h>
>> #include <sys/mman.h>
>>
>> int
>> main(void)
>> {
>> FILE *fp = tmpfile();
>> if (fp == NULL)
>> abort();
>> int fd = fileno(fp);
>> posix_fallocate(fd, 0, 1);
>> char *p = mmap(NULL, 1, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
>> *p = 1;
>> }
>>
>> should not crash even if the file system does not support fallocate.
>
> I think that's buggy application code.
>
> Failing to check the return value of a library call that documents
> EOPNOTSUPP as a valid error is a bug. IOWs, the above code *should*
> SIGBUS on the mmap access, because it failed to verify that the file
> extension operation actually worked.
Sorry, I made the example confusing.
How would the application deal with failure due to lack of fallocate
support? It would have to do a pwrite, like posix_fallocate does to
today, or maybe ftruncate. This is way I think removing the fallback
from posix_fallocate completely is mostly pointless.
>> I hope we can agree on that. I expect avoiding SIGBUS errors because
>> of insufficient file size is a common use case for posix_fallocate.
>> This use is not really an optimization, it's required to get mmap
>> working properly.
>>
>> If we can get an fallocate mode that we can use as a fallback to
>> increase the file size with a zero flag argument, we can definitely
>
> The fallocate() API already support that, in two different ways:
> FALLOC_FL_ZERO_RANGE and FALLOC_FL_WRITE_ZEROS.
Neither is appropriate for posix_fallocate because they are as
destructive as the existing fallback.
> But, again, not all filesystems support these, so userspace has to
> be prepared to receive -EOPNOTSUPP from these calls. Hence userspace
> has to do the right thing for posix_fallocate() if you want to
> ensure that it always extend the file size even when fallocate()
> calls fail...
Sure, but eventually, we may get into a better situation.
>> use that in posix_fallocate (replacing the fallback path on kernels
>> that support it). All local file systems should be able to implement
>> that (but perhaps not efficiently). Basically, what we need here is a
>> non-destructive ftruncate.
>
> You aren't going to get support for such new commands on existing
> kernels, so userspace is still going to have to code the ftruncate()
> fallback itself for the desired behaviour to be provided
> consistently to applications.
>
> As such, I don't see any reason for the fallocate() syscall
> providing some whacky "ftruncate() in all but name" mode.
Please reconsider. If we start fixing this, we'll eventually be in a
position where the glibc fallback code never runs.
Thanks,
Florian
Powered by blists - more mailing lists