linux-kernel - Re: [PATCH] fuse: clarify extending writes handling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250820162724.GL7942@frogsfrogsfrogs>
Date: Wed, 20 Aug 2025 09:27:24 -0700
From: "Darrick J. Wong" <djwong@...nel.org>
To: Miklos Szeredi <miklos@...redi.hu>
Cc: Chunsheng Luo <luochunsheng@...c.edu>, linux-fsdevel@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] fuse: clarify extending writes handling

On Wed, Aug 20, 2025 at 08:52:35AM +0200, Miklos Szeredi wrote:
> On Wed, 20 Aug 2025 at 07:20, Darrick J. Wong <djwong@...nel.org> wrote:
> 
> > I don't understand the current behavior at all -- why do the callers of
> > fuse_writeback_range pass an @end parameter when it ignores @end in
> > favor of LLONG_MAX?  And why is it necessary to flush to EOF at all?
> > fallocate and copy_file_range both take i_rwsem, so what could they be
> > racing with?  Or am I missing something here?
> 
> commit 59bda8ecee2f ("fuse: flush extending writes")
> 
> The issue AFAICS is that if writes beyond the range end are not
> flushed, then EOF on backing file could be below range end (if pending
> writes create a hole), hence copy_file_range() will stop copying at
> the start of that hole.
> 
> So this patch is incorrect, since not flushing copy_file_range input
> file could result in a short copy.

<nod> As far as Mr. Luo's patch is concerned, I agree that a strict "no
behavior changes" patch should have changed the inode_in writeback_range
call to:

	err = fuse_writeback_range(inode_in, pos_in, LLONG_MAX);

Though if all callsites are going to pass LLONG_MAX in as @end, then
why not eliminate the parameter entirely?

What I'm (still) wondering is why was it necessary to flush the source
and destination ranges between (pos + len - 1) and LLONG_MAX?  But let's
see, what did 59bda8ecee2f have to say?

| fuse: flush extending writes
|
| Callers of fuse_writeback_range() assume that the file is ready for
| modification by the server in the supplied byte range after the call
| returns.

Ok, so far so good.

| If there's a write that extends the file beyond the end of the supplied
| range, then the file needs to be extended to at least the end of the range,
| but currently that's not done.
|
| There are at least two cases where this can cause problems:
|
|  - copy_file_range() will return short count if the file is not extended
|    up to end of the source range.

That suggests to me

filemap_write_and_wait_range(inode_in, pos_in, pos_in + pos_len - 1)

but I don't see why we need to flush more bytes than that?  The server's
CFR implementation has all the bytes it needs to read the source data.

Hum.  But what if CFR is actually reflink?  I guess you'd want to
buffer-copy the unaligned head and tail regions, and reflink the
allocation units in the middle, but I still don't see why the fuse
server needs more of the source file than (pos, pos + len - 1)?

|  - FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE will not extend the file,
|    hence the region may not be fully allocated.

Hrm, ZERO | KEEP_SIZE is supposed to allow preallocation of blocks
beyond EOF, or at least that's what XFS does:

$ truncate -s 10m /mnt/test
$ xfs_io -c 'fzero -k 100m 64k' /mnt/test
$ filefrag -v /mnt/test
Filesystem type is: 58465342
File size of /mnt/test is 10485760 (2560 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:    25600..   25615:         24..        39:     16:      25600: last,unwritten,eof
/mnt/test: 1 extent found

as does ext4:

$ truncate -s 10m /mnt/test
$ xfs_io -c 'fzero -k 100m 64k' /mnt/test
$ filefrag -v /mnt/test
Filesystem type is: ef53
File size of /mnt/test is 10485760 (2560 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:    25600..   25615:      33808..     33823:     16:      25600: last,unwritten,eof
/mnt/test: 1 extent found

(Notice that the 10M file has one extent starting at 100M)

I can see why you'd want to flush the target range in case the fuse
server has a better trick up its sleeve to zero the already-written
region that isn't the punch-and-realloc behavior that xfs and ext4 have.
But here too I don't see why the fuse server would need more than the
target region.

Though I think for both cases we end up flushing more than the target
region, because the page cache rounds start down and end up to PAGE_SIZE
boundaries.

| Fix by flushing writes from the start of the range up to the end of the
| file.  This could be optimized if the writes are non-extending, etc, but
| it's probably not worth the trouble.

<shrug> Was there a bug report associated with this commit?  I couldn't
find the any hits on the subject line in lore.  Was this simply a big
hammer that solved whatever corruption problems were occuring?  Or
something found in code inspection?

<confused>

--D

> Thanks,
> Miklos
>