[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6c74ad63-3afc-4549-9ac6-494b9a63e839@maowtm.org>
Date: Wed, 22 Oct 2025 00:18:48 +0100
From: Tingmao Wang <m@...wtm.org>
To: Dominique Martinet <asmadeus@...ewreck.org>
Cc: Song Liu <song@...nel.org>, Andrii Nakryiko <andrii.nakryiko@...il.com>,
Eric Van Hensbergen <ericvh@...nel.org>, Latchesar Ionkov
<lucho@...kov.net>, Christian Schoenebeck <linux_oss@...debyte.com>,
Alexei Starovoitov <ast@...nel.org>, linux-kernel@...r.kernel.org,
v9fs@...ts.linux.dev, bpf@...r.kernel.org
Subject: Re: [PATCH] fs/9p: don't use cached metadata in revalidate for
cache=mmap
I do apologize for missing this initially... (I wonder if it would have
been caught by xfstests in cache=mmap mode)
On 10/21/25 23:09, Dominique Martinet via B4 Relay wrote:
> From: Dominique Martinet <asmadeus@...ewreck.org>
>
> cache=mmap is a mode that doesn't cache metadata, but still has
> writeback cache.
> In commit 290434474c33 ("fs/9p: Refresh metadata in d_revalidate
> for uncached mode too") we considered metadata cache to be enough to
> not look at the server, but in writeback cache too looking at the server
> size would make the vfs consider the file has been truncated before the
> data has been flushed out, making the following repro fail (nothing is
> ever read back, the resulting file ends up with no data written)
>
> ---
> I was suspecting cache=mmap when I saw vmtest used that, and it's
> confirmed with Song's reproducer...
> This makes the repro work, would one of you be able to confirm this?
> Once confirmed I'll send this to Linus directly
>
> Thanks again and sorry for lack of cache=mmap tests :/
>
> Signed-off-by: Dominique Martinet <asmadeus@...ewreck.org>
> ---
> #include <fcntl.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <unistd.h>
>
> char buf[4096];
>
> int main(int argc, char *argv[])
> {
> int ret, i;
> int fdw, fdr;
>
> if (argc < 2)
> return 1;
>
> fdw = openat(AT_FDCWD, argv[1], O_RDWR|O_CREAT|O_EXCL|O_CLOEXEC, 0600);
> if (fdw < 0) {
> fprintf(stderr, "cannot open fdw\n");
> return 1;
> }
> write(fdw, buf, sizeof(buf));
>
> fdr = openat(AT_FDCWD, argv[1], O_RDONLY|O_CLOEXEC);
>
> if (fdr < 0) {
> fprintf(stderr, "cannot open fdr\n");
> close(fdw);
> return 1;
> }
>
> for (i = 0; i < 10; i++) {
> ret = read(fdr, buf, sizeof(buf));
> fprintf(stderr, "i: %d, read returns %d\n", i, ret);
> }
>
> close(fdr);
> close(fdw);
> return 0;
> }
> ---
>
> Reported-by: Song Liu <song@...nel.org>
> Link: https://lkml.kernel.org/r/CAHzjS_u_SYdt5=2gYO_dxzMKXzGMt-TfdE_ueowg-Hq5tRCAiw@mail.gmail.com
> Reported-by: Andrii Nakryiko <andrii.nakryiko@...il.com>
> Link: https://lore.kernel.org/bpf/CAEf4BzZbCE4tLoDZyUf_aASpgAGFj75QMfSXX4a4dLYixnOiLg@mail.gmail.com/
> Fixes: 290434474c33 ("fs/9p: Refresh metadata in d_revalidate for uncached mode too")
> Signed-off-by: Dominique Martinet <asmadeus@...ewreck.org>
> ---
> fs/9p/vfs_dentry.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/fs/9p/vfs_dentry.c b/fs/9p/vfs_dentry.c
> index f3248a3e54023489054337bf98e87a1d7b1fbafc..2f3654bf6275ffa802dc25b9a84bd61a62d5723c 100644
> --- a/fs/9p/vfs_dentry.c
> +++ b/fs/9p/vfs_dentry.c
> @@ -78,7 +78,11 @@ static int __v9fs_lookup_revalidate(struct dentry *dentry, unsigned int flags)
> v9inode = V9FS_I(inode);
> struct v9fs_session_info *v9ses = v9fs_inode2v9ses(inode);
>
> - cached = v9ses->cache & (CACHE_META | CACHE_LOOSE);
> + /* We also don't want to refresh attr here in writeback cache mode,
> + * otherwise a write that hasn't been propagated to server would
> + * incorrectly get the old size back and truncate the file before
> + * the write happens */
> + cached = v9ses->cache & (CACHE_META | CACHE_WRITEBACK | CACHE_LOOSE);
This makes sense, but I was also wondering about this bit in
v9fs_refresh_inode / ..._dotl:
flags = (v9ses->cache & CACHE_LOOSE) ?
V9FS_STAT2INODE_KEEP_ISIZE : 0;
I do wonder what else can cause us to end up mistakenly updating the
i_size, since the condition below would also pass if
v9inode->cache_validity & V9FS_INO_INVALID_ATTR is true, even in cached
mode:
>
> if (!cached || v9inode->cache_validity & V9FS_INO_INVALID_ATTR) {
> int retval;
One instances where we set this invalid flag is in v9fs_vfs_rename. With
the program pasted below, which adds an additional renaming of the file
written before reading, this seems to still cause truncation, even with
this fix:
#define _GNU_SOURCE
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/stat.h>
char buf[4096];
int main(int argc, char *argv[])
{
int ret, i;
int fdw, fdr;
struct stat statbuf;
if (argc < 3)
return 1;
fdw = openat(AT_FDCWD, argv[1], O_RDWR | O_CREAT | O_EXCL | O_CLOEXEC,
0600);
if (fdw < 0) {
fprintf(stderr, "cannot open fdw\n");
return 1;
}
ret = write(fdw, buf, sizeof(buf));
if (ret < 0) {
fprintf(stderr, "write failed\n");
return 1;
}
ret = renameat(AT_FDCWD, argv[1], AT_FDCWD, argv[2]);
if (ret < 0) {
fprintf(stderr, "renameat failed\n");
return 1;
}
fdr = openat(AT_FDCWD, argv[2], O_RDONLY | O_CLOEXEC);
if (fdr < 0) {
fprintf(stderr, "cannot open fdr\n");
return 1;
}
ret = read(fdr, buf, sizeof(buf));
fprintf(stderr, "read returns %d\n", ret);
ret = fstat(fdr, &statbuf);
if (ret < 0) {
fprintf(stderr, "fstat failed\n");
return 1;
}
fprintf(stderr, "fstat: size = %lld\n", statbuf.st_size);
close(fdr);
unlink(argv[1]);
unlink(argv[2]);
return 0;
}
Output:
root@...18-rc2:/# ./reproducer /tmp/9p/a /tmp/9p/b
read returns 0
fstat: size = 0
root@...18-rc2:/# ./reproducer.orig /tmp/9p/a /tmp/9p/b # the original reproducer does work
i: 0, read returns 4096
i: 1, read returns 0
...
Before the "Refresh metadata in d_revalidate for uncached mode too" patch
series, it happened that in cache=mmap, v9fs_lookup_revalidate would not
be called at all, since it's not used for "uncached" mode, where uncached
is defined as !(CACHE_META|CACHE_LOOSE) in v9fs_mount. Therefore this is
probably still a regression introduced by that series :(
The below change fixes both reproducer:
diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index 69f378a83775..316fc41513d7 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -1352,7 +1352,7 @@ int v9fs_refresh_inode(struct p9_fid *fid, struct inode *inode)
* We don't want to refresh inode->i_size,
* because we may have cached data
*/
- flags = (v9ses->cache & CACHE_LOOSE) ?
+ flags = (v9ses->cache & (CACHE_LOOSE | CACHE_WRITEBACK)) ?
V9FS_STAT2INODE_KEEP_ISIZE : 0;
v9fs_stat2inode(st, inode, inode->i_sb, flags);
out:
@@ -1399,4 +1399,3 @@ static const struct inode_operations v9fs_symlink_inode_operations = {
.getattr = v9fs_vfs_getattr,
.setattr = v9fs_vfs_setattr,
};
-
diff --git a/fs/9p/vfs_inode_dotl.c b/fs/9p/vfs_inode_dotl.c
index 0b404e8484d2..500d8e922f07 100644
--- a/fs/9p/vfs_inode_dotl.c
+++ b/fs/9p/vfs_inode_dotl.c
@@ -910,7 +910,7 @@ int v9fs_refresh_inode_dotl(struct p9_fid *fid, struct inode *inode)
* We don't want to refresh inode->i_size,
* because we may have cached data
*/
- flags = (v9ses->cache & CACHE_LOOSE) ?
+ flags = (v9ses->cache & (CACHE_LOOSE | CACHE_WRITEBACK)) ?
V9FS_STAT2INODE_KEEP_ISIZE : 0;
v9fs_stat2inode_dotl(st, inode, flags);
out:
Should we change this too? (Well, in fact, the change above alone makes
both issue disappear, but I'm not 100% sure about the implication of
updating metadata aside from i_size for inodes with cached data)
Kind regards,
Tingmao
Powered by blists - more mailing lists