lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6c74ad63-3afc-4549-9ac6-494b9a63e839@maowtm.org>
Date: Wed, 22 Oct 2025 00:18:48 +0100
From: Tingmao Wang <m@...wtm.org>
To: Dominique Martinet <asmadeus@...ewreck.org>
Cc: Song Liu <song@...nel.org>, Andrii Nakryiko <andrii.nakryiko@...il.com>,
 Eric Van Hensbergen <ericvh@...nel.org>, Latchesar Ionkov
 <lucho@...kov.net>, Christian Schoenebeck <linux_oss@...debyte.com>,
 Alexei Starovoitov <ast@...nel.org>, linux-kernel@...r.kernel.org,
 v9fs@...ts.linux.dev, bpf@...r.kernel.org
Subject: Re: [PATCH] fs/9p: don't use cached metadata in revalidate for
 cache=mmap

I do apologize for missing this initially... (I wonder if it would have
been caught by xfstests in cache=mmap mode)

On 10/21/25 23:09, Dominique Martinet via B4 Relay wrote:
> From: Dominique Martinet <asmadeus@...ewreck.org>
> 
> cache=mmap is a mode that doesn't cache metadata, but still has
> writeback cache.
> In commit 290434474c33 ("fs/9p: Refresh metadata in d_revalidate
> for uncached mode too") we considered metadata cache to be enough to
> not look at the server, but in writeback cache too looking at the server
> size would make the vfs consider the file has been truncated before the
> data has been flushed out, making the following repro fail (nothing is
> ever read back, the resulting file ends up with no data written)
> 
> ---
> I was suspecting cache=mmap when I saw vmtest used that, and it's
> confirmed with Song's reproducer...
> This makes the repro work, would one of you be able to confirm this?
> Once confirmed I'll send this to Linus directly
> 
> Thanks again and sorry for lack of cache=mmap tests :/
> 
> Signed-off-by: Dominique Martinet <asmadeus@...ewreck.org>
> ---
> #include <fcntl.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <unistd.h>
> 
> char buf[4096];
> 
> int main(int argc, char *argv[])
> {
>         int ret, i;
>         int fdw, fdr;
> 
>         if (argc < 2)
>                 return 1;
> 
>         fdw = openat(AT_FDCWD, argv[1], O_RDWR|O_CREAT|O_EXCL|O_CLOEXEC, 0600);
>         if (fdw < 0) {
>                 fprintf(stderr, "cannot open fdw\n");
>                 return 1;
>         }
>         write(fdw, buf, sizeof(buf));
> 
>         fdr = openat(AT_FDCWD, argv[1], O_RDONLY|O_CLOEXEC);
> 
>         if (fdr < 0) {
>                 fprintf(stderr, "cannot open fdr\n");
>                 close(fdw);
>                 return 1;
>         }
> 
>         for (i = 0; i < 10; i++) {
>                 ret = read(fdr, buf, sizeof(buf));
>                 fprintf(stderr, "i: %d, read returns %d\n", i, ret);
>         }
> 
>         close(fdr);
>         close(fdw);
>         return 0;
> }
> ---
> 
> Reported-by: Song Liu <song@...nel.org>
> Link: https://lkml.kernel.org/r/CAHzjS_u_SYdt5=2gYO_dxzMKXzGMt-TfdE_ueowg-Hq5tRCAiw@mail.gmail.com
> Reported-by: Andrii Nakryiko <andrii.nakryiko@...il.com>
> Link: https://lore.kernel.org/bpf/CAEf4BzZbCE4tLoDZyUf_aASpgAGFj75QMfSXX4a4dLYixnOiLg@mail.gmail.com/
> Fixes: 290434474c33 ("fs/9p: Refresh metadata in d_revalidate for uncached mode too")
> Signed-off-by: Dominique Martinet <asmadeus@...ewreck.org>
> ---
>  fs/9p/vfs_dentry.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/9p/vfs_dentry.c b/fs/9p/vfs_dentry.c
> index f3248a3e54023489054337bf98e87a1d7b1fbafc..2f3654bf6275ffa802dc25b9a84bd61a62d5723c 100644
> --- a/fs/9p/vfs_dentry.c
> +++ b/fs/9p/vfs_dentry.c
> @@ -78,7 +78,11 @@ static int __v9fs_lookup_revalidate(struct dentry *dentry, unsigned int flags)
>  	v9inode = V9FS_I(inode);
>  	struct v9fs_session_info *v9ses = v9fs_inode2v9ses(inode);
>  
> -	cached = v9ses->cache & (CACHE_META | CACHE_LOOSE);
> +	/* We also don't want to refresh attr here in writeback cache mode,
> +	 * otherwise a write that hasn't been propagated to server would
> +	 * incorrectly get the old size back and truncate the file before
> +	 * the write happens */
> +	cached = v9ses->cache & (CACHE_META | CACHE_WRITEBACK | CACHE_LOOSE);

This makes sense, but I was also wondering about this bit in
v9fs_refresh_inode / ..._dotl:

	flags = (v9ses->cache & CACHE_LOOSE) ?
		V9FS_STAT2INODE_KEEP_ISIZE : 0;

I do wonder what else can cause us to end up mistakenly updating the
i_size, since the condition below would also pass if
v9inode->cache_validity & V9FS_INO_INVALID_ATTR is true, even in cached
mode:

>  
>  	if (!cached || v9inode->cache_validity & V9FS_INO_INVALID_ATTR) {
>  		int retval;

One instances where we set this invalid flag is in v9fs_vfs_rename.  With
the program pasted below, which adds an additional renaming of the file
written before reading, this seems to still cause truncation, even with
this fix:

#define _GNU_SOURCE
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/stat.h>

char buf[4096];

int main(int argc, char *argv[])
{
	int ret, i;
	int fdw, fdr;
	struct stat statbuf;

	if (argc < 3)
		return 1;

	fdw = openat(AT_FDCWD, argv[1], O_RDWR | O_CREAT | O_EXCL | O_CLOEXEC,
		     0600);
	if (fdw < 0) {
		fprintf(stderr, "cannot open fdw\n");
		return 1;
	}
	ret = write(fdw, buf, sizeof(buf));
	if (ret < 0) {
		fprintf(stderr, "write failed\n");
		return 1;
	}

	ret = renameat(AT_FDCWD, argv[1], AT_FDCWD, argv[2]);
	if (ret < 0) {
		fprintf(stderr, "renameat failed\n");
		return 1;
	}

	fdr = openat(AT_FDCWD, argv[2], O_RDONLY | O_CLOEXEC);

	if (fdr < 0) {
		fprintf(stderr, "cannot open fdr\n");
		return 1;
	}

	ret = read(fdr, buf, sizeof(buf));
	fprintf(stderr, "read returns %d\n", ret);

	ret = fstat(fdr, &statbuf);
	if (ret < 0) {
		fprintf(stderr, "fstat failed\n");
		return 1;
	}
	fprintf(stderr, "fstat: size = %lld\n", statbuf.st_size);

	close(fdr);
	unlink(argv[1]);
	unlink(argv[2]);
	return 0;
}

Output:

    root@...18-rc2:/# ./reproducer /tmp/9p/a /tmp/9p/b
    read returns 0
    fstat: size = 0

    root@...18-rc2:/# ./reproducer.orig /tmp/9p/a /tmp/9p/b # the original reproducer does work
    i: 0, read returns 4096
    i: 1, read returns 0
    ...

Before the "Refresh metadata in d_revalidate for uncached mode too" patch
series, it happened that in cache=mmap, v9fs_lookup_revalidate would not
be called at all, since it's not used for "uncached" mode, where uncached
is defined as !(CACHE_META|CACHE_LOOSE) in v9fs_mount.  Therefore this is
probably still a regression introduced by that series :(

The below change fixes both reproducer:

diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index 69f378a83775..316fc41513d7 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -1352,7 +1352,7 @@ int v9fs_refresh_inode(struct p9_fid *fid, struct inode *inode)
 	 * We don't want to refresh inode->i_size,
 	 * because we may have cached data
 	 */
-	flags = (v9ses->cache & CACHE_LOOSE) ?
+	flags = (v9ses->cache & (CACHE_LOOSE | CACHE_WRITEBACK)) ?
 		V9FS_STAT2INODE_KEEP_ISIZE : 0;
 	v9fs_stat2inode(st, inode, inode->i_sb, flags);
 out:
@@ -1399,4 +1399,3 @@ static const struct inode_operations v9fs_symlink_inode_operations = {
 	.getattr = v9fs_vfs_getattr,
 	.setattr = v9fs_vfs_setattr,
 };
-
diff --git a/fs/9p/vfs_inode_dotl.c b/fs/9p/vfs_inode_dotl.c
index 0b404e8484d2..500d8e922f07 100644
--- a/fs/9p/vfs_inode_dotl.c
+++ b/fs/9p/vfs_inode_dotl.c
@@ -910,7 +910,7 @@ int v9fs_refresh_inode_dotl(struct p9_fid *fid, struct inode *inode)
 	 * We don't want to refresh inode->i_size,
 	 * because we may have cached data
 	 */
-	flags = (v9ses->cache & CACHE_LOOSE) ?
+	flags = (v9ses->cache & (CACHE_LOOSE | CACHE_WRITEBACK)) ?
 		V9FS_STAT2INODE_KEEP_ISIZE : 0;
 	v9fs_stat2inode_dotl(st, inode, flags);
 out:

Should we change this too?  (Well, in fact, the change above alone makes
both issue disappear, but I'm not 100% sure about the implication of
updating metadata aside from i_size for inodes with cached data)


Kind regards,
Tingmao

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ