[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8618918.T7Z3S40VBb@weasel>
Date: Sat, 20 Dec 2025 15:55:09 +0100
From: Christian Schoenebeck <linux_oss@...debyte.com>
To: Christian Brauner <brauner@...nel.org>,
David Howells <dhowells@...hat.com>,
Dominique Martinet <asmadeus@...ewreck.org>
Cc: Eric Van Hensbergen <ericvh@...nel.org>,
Latchesar Ionkov <lucho@...kov.net>, Chris Arges <carges@...udflare.com>,
Matthew Wilcox <willy@...radead.org>, Steve French <sfrench@...ba.org>,
v9fs@...ts.linux.dev, netfs@...ts.linux.dev, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] netfs: Fix early read unlock of page with EOF in middle
On Saturday, 20 December 2025 13:31:40 CET David Howells wrote:
> The read result collection for buffered reads seems to run ahead of the
> completion of subrequests under some circumstances, as can be seen in the
> following log snippet:
>
> 9p_client_res: client 18446612686390831168 response P9_TREAD tag 0 err
> 0 ...
> netfs_sreq: R=00001b55[1] DOWN TERM f=192 s=0 5fb2/5fb2 s=5 e=0
> ...
> netfs_collect_folio: R=00001b55 ix=00004 r=4000-5000 t=4000/5fb2
> netfs_folio: i=157f3 ix=00004-00004 read-done
> netfs_folio: i=157f3 ix=00004-00004 read-unlock
> netfs_collect_folio: R=00001b55 ix=00005 r=5000-5fb2 t=5000/5fb2
> netfs_folio: i=157f3 ix=00005-00005 read-done
> netfs_folio: i=157f3 ix=00005-00005 read-unlock
> ...
> netfs_collect_stream: R=00001b55[0:] cto=5fb2 frn=ffffffff
> netfs_collect_state: R=00001b55 col=5fb2 cln=6000 n=c
> netfs_collect_stream: R=00001b55[0:] cto=5fb2 frn=ffffffff
> netfs_collect_state: R=00001b55 col=5fb2 cln=6000 n=8
> ...
> netfs_sreq: R=00001b55[2] ZERO SUBMT f=000 s=5fb2 0/4e s=0 e=0
> netfs_sreq: R=00001b55[2] ZERO TERM f=102 s=5fb2 4e/4e s=5 e=0
>
> The 'cto=5fb2' indicates the collected file pos we've collected results to
> so far - but we still have 0x4e more bytes to go - so we shouldn't have
> collected folio ix=00005 yet. The 'ZERO' subreq that clears the tail
> happens after we unlock the folio, allowing the application to see the
> uncleared tail through mmap.
>
> The problem is that netfs_read_unlock_folios() will unlock a folio in which
> the amount of read results collected hits EOF position - but the ZERO
> subreq lies beyond that and so happens after.
>
> Fix this by changing the end check to always be the end of the folio and
> never the end of the file.
>
> In the future, I should look at clearing to the end of the folio here rather
> than adding a ZERO subreq to do this. On the other hand, the ZERO subreq
> can run in parallel with an async READ subreq. Further, the ZERO subreq
> may still be necessary to, say, handle extents in a ceph file that don't
> have any backing store and are thus implicitly all zeros.
>
> This can be reproduced by creating a file, the size of which doesn't align
> to a page boundary, e.g. 24998 (0x5fb2) bytes and then doing something
> like:
>
> xfs_io -c "mmap -r 0 0x6000" -c "madvise -d 0 0x6000" \
> -c "mread -v 0 0x6000" /xfstest.test/x
>
> The last 0x4e bytes should all be 00, but if the tail hasn't been cleared
> yet, you may see rubbish there. This can be reproduced with kafs by
> modifying the kernel to disable the call to netfs_read_subreq_progress()
> and to stop afs_issue_read() from doing the async call for NETFS_READAHEAD.
> Reproduction can be made easier by inserting an mdelay(100) in
> netfs_issue_read() for the ZERO-subreq case.
>
> AFS and CIFS are normally unlikely to show this as they dispatch READ ops
> asynchronously, which allows the ZERO-subreq to finish first. 9P's READ op
> is completely synchronous, so the ZERO-subreq will always happen after. It
> isn't seen all the time, though, because the collection may be done in a
> worker thread.
>
> Reported-by: Christian Schoenebeck <linux_oss@...debyte.com>
> Link: https://lore.kernel.org/r/8622834.T7Z3S40VBb@weasel/
> Signed-off-by: David Howells <dhowells@...hat.com>
> Suggested-by: Dominique Martinet <asmadeus@...ewreck.org>
> cc: Dominique Martinet <asmadeus@...ewreck.org>
> cc: Christian Schoenebeck <linux_oss@...debyte.com>
> cc: v9fs@...ts.linux.dev
> cc: netfs@...ts.linux.dev
> cc: linux-fsdevel@...r.kernel.org
> ---
I had bisected this mmap() data corruption to e2d46f2ec332 ("netfs: Change the
read result collector to only use one work item"). So maybe adding a Fixes:
tag for this as suggested by Dominique?
With the patch applied, this issue disappeared. Give me some hours for more
thorough tests, due to the random factor involved.
> fs/netfs/read_collect.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/netfs/read_collect.c b/fs/netfs/read_collect.c
> index a95e7aadafd0..7a0ffa675fb1 100644
> --- a/fs/netfs/read_collect.c
> +++ b/fs/netfs/read_collect.c
> @@ -137,7 +137,7 @@ static void netfs_read_unlock_folios(struct
> netfs_io_request *rreq, rreq->front_folio_order = order;
> fsize = PAGE_SIZE << order;
> fpos = folio_pos(folio);
> - fend = umin(fpos + fsize, rreq->i_size);
> + fend = fpos + fsize;
>
> trace_netfs_collect_folio(rreq, folio, fend,
collected_to);
What about write_collect.c side, is it safe as is?
/Christian
Powered by blists - more mailing lists