lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aRESlvWf9VquNzx3@dread.disaster.area>
Date: Mon, 10 Nov 2025 09:15:50 +1100
From: Dave Chinner <david@...morbit.com>
To: Florian Weimer <fw@...eb.enyo.de>
Cc: Christoph Hellwig <hch@....de>, Florian Weimer <fweimer@...hat.com>,
	Matthew Wilcox <willy@...radead.org>,
	Hans Holmberg <hans.holmberg@....com>, linux-xfs@...r.kernel.org,
	Carlos Maiolino <cem@...nel.org>,
	"Darrick J . Wong" <djwong@...nel.org>,
	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
	libc-alpha@...rceware.org
Subject: Re: [RFC] xfs: fake fallocate success for always CoW inodes

On Sat, Nov 08, 2025 at 01:30:18PM +0100, Florian Weimer wrote:
> * Christoph Hellwig:
> 
> > On Thu, Nov 06, 2025 at 05:31:28PM +0100, Florian Weimer wrote:
> >> It's been a few years, I think, and maybe we should drop the allocation
> >> logic from posix_fallocate in glibc?  Assuming that it's implemented
> >> everywhere it makes sense?
> >
> > I really think it should go away.  If it turns out we find cases where
> > it was useful we can try to implement a zeroing fallocate in the kernel
> > for the file system where people want it.

This is what the shiny new FALLOC_FL_WRITE_ZEROS command is supposed
to provide. We don't have widepsread support in filesystems for it
yet, though.

> > gfs2 for example currently
> > has such an implementation, and we could have somewhat generic library
> > version of it.

Yup, seems like a iomap iter loop would be pretty trivial to
abstract from that...

> Sorry, I remember now where this got stuck the last time.
> 
> This program:
> 
> #include <fcntl.h>
> #include <stddef.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <sys/mman.h>
> 
> int
> main(void)
> {
>   FILE *fp = tmpfile();
>   if (fp == NULL)
>     abort();
>   int fd = fileno(fp);
>   posix_fallocate(fd, 0, 1);
>   char *p = mmap(NULL, 1, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
>   *p = 1;
> }
> 
> should not crash even if the file system does not support fallocate.

I think that's buggy application code.

Failing to check the return value of a library call that documents
EOPNOTSUPP as a valid error is a bug. IOWs, the above code *should*
SIGBUS on the mmap access, because it failed to verify that the file
extension operation actually worked.

I mean, if this was "ftruncate(1); mmap(); *p =1" and ftruncate()
failed and so SIGBUS was delivered, there would be no doubt that
this is an application bug. Why is should we treat errors returned
by fallocate() and/or posix_fallocate() any different here?

> I hope we can agree on that.  I expect avoiding SIGBUS errors because
> of insufficient file size is a common use case for posix_fallocate.
> This use is not really an optimization, it's required to get mmap
> working properly.
> 
> If we can get an fallocate mode that we can use as a fallback to
> increase the file size with a zero flag argument, we can definitely

The fallocate() API already support that, in two different ways:
FALLOC_FL_ZERO_RANGE and FALLOC_FL_WRITE_ZEROS. 

But, again, not all filesystems support these, so userspace has to
be prepared to receive -EOPNOTSUPP from these calls. Hence userspace
has to do the right thing for posix_fallocate() if you want to
ensure that it always extend the file size even when fallocate()
calls fail...

> use that in posix_fallocate (replacing the fallback path on kernels
> that support it).  All local file systems should be able to implement
> that (but perhaps not efficiently).  Basically, what we need here is a
> non-destructive ftruncate.

You aren't going to get support for such new commands on existing
kernels, so userspace is still going to have to code the ftruncate()
fallback itself for the desired behaviour to be provided
consistently to applications.

As such, I don't see any reason for the fallocate() syscall
providing some whacky "ftruncate() in all but name" mode.

-Dave.
-- 
Dave Chinner
david@...morbit.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ