lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1866935.Y7JIjT2MHT@silver>
Date:   Fri, 01 Apr 2022 16:19:20 +0200
From:   Christian Schoenebeck <linux_oss@...debyte.com>
To:     asmadeus@...ewreck.org
Cc:     David Kahurani <k.kahurani@...il.com>, davem@...emloft.net,
        ericvh@...il.com, kuba@...nel.org, linux-kernel@...r.kernel.org,
        lucho@...kov.net, netdev@...r.kernel.org,
        v9fs-developer@...ts.sourceforge.net,
        David Howells <dhowells@...hat.com>, Greg Kurz <groug@...d.org>
Subject: Re: 9p fs-cache tests/benchmark (was: 9p fscache Duplicate cookie detected)

On Mittwoch, 30. März 2022 23:47:41 CEST asmadeus@...ewreck.org wrote:
> Thanks Christian!
> 
> Christian Schoenebeck wrote on Wed, Mar 30, 2022 at 02:21:16PM +0200:
[...]
> > > Christian Schoenebeck wrote on Sat, Mar 26, 2022 at 01:36:31PM +0100:
> > > hm, fscache code shouldn't be used for cache=mmap, I'm surprised you can
> > > hit this...
> > 
> > I assume that you mean that 9p driver does not explicitly ask for fs-cache
> > being used for mmap. I see that 9p uses the kernel's generalized mmap
> > implementation:
> > 
> > https://github.com/torvalds/linux/blob/d888c83fcec75194a8a48ccd283953bdba7
> > b2550/fs/9p/vfs_file.c#L481
> > 
> > I haven't dived further into this, but the kernel has to use some kind of
> > filesystem cache anyway to provide the mmap functionality, so I guess it
> > makes sense that I got those warning messages from the FS-Cache
> > subsystem?
> It uses the generic mmap which has vfs caching, but definitely not
> fs-cache.
> fs-cache adds more hooks for cachefilesd (writing file contents to disk
> for bigger cache) and things like that cache=loose/mmap shouldn't be
> caring about. cache=loose actually just disables some key parts so I'm
> not surprised it shares bugs with the new code, but cache=mmap is really
> independant and I need to trace where these come from...

From looking at the sources, the call stack for emitting "FS-Cache: Duplicate
cookie detected" error messages with 9p "cache=mmap" option seems to be:

1. v9fs_vfs_lookup [fs/9p/vfs_inode.c, 788]:

	inode = v9fs_get_new_inode_from_fid(v9ses, fid, dir->i_sb);

2. v9fs_get_new_inode_from_fid [fs/9p/v9fs.h, 228]:

	return v9fs_inode_from_fid_dotl(v9ses, fid, sb, 1);

3. v9fs_inode_from_fid_dotl [fs/9p/vfs_inode_dotl.c, 157]:

	inode = v9fs_qid_iget_dotl(sb, &st->qid, fid, st, new);

4. v9fs_qid_iget_dotl [fs/9p/vfs_inode_dotl.c, 133]:

	v9fs_cache_inode_get_cookie(inode);
	^--- Called independent of function argument "new"'s value here
   https://github.com/torvalds/linux/blob/e8b767f5e04097aaedcd6e06e2270f9fe5282696/fs/9p/vfs_inode_dotl.c#L133

5. v9fs_cache_inode_get_cookie [fs/9p/cache.c, 68]:

	v9inode->fscache =
		fscache_acquire_cookie(v9fs_session_cache(v9ses),
				       0,
				       &path, sizeof(path),
				       &version, sizeof(version),
				       i_size_read(&v9inode->vfs_inode));

6. fscache_acquire_cookie [include/linux/fscache.h, 251]:

	return __fscache_acquire_cookie(volume, advice,
					index_key, index_key_len,
					aux_data, aux_data_len,
					object_size);

7. __fscache_acquire_cookie [fs/fscache/cookie.c, 472]:

	if (!fscache_hash_cookie(cookie)) {
		fscache_see_cookie(cookie, fscache_cookie_discard);
		fscache_free_cookie(cookie);
		return NULL;
	}

8. fscache_hash_cookie [fs/fscache/cookie.c, 430]:

	pr_err("Duplicate cookie detected\n");

> > With QEMU >= 5.2 you should see the following QEMU warning with your
> > reproducer:
> > 
> > "
> > qemu-system-x86_64: warning: 9p: Multiple devices detected in same VirtFS
> > export, which might lead to file ID collisions and severe misbehaviours on
> > guest! You should either use a separate export for each device shared from
> > host or use virtfs option 'multidevs=remap'!
> > "
> 
> oh, I wasn't aware of the new option. Good job there!
> 
> It's the easiest way to reproduce but there are also harder to fix
> collisions, file systems only guarantee unicity for (fsid,inode
> number,version) which is usually bigger than 128 bits (although version
> is often 0), but version isn't exposed to userspace easily...
> What we'd want for unicity is handle from e.g. name_to_handle_at but
> that'd add overhead, wouldn't fit in qid path and not all fs are capable
> of providing one... The 9p protocol just doesn't want bigger handles
> than qid path.

No bigger qid.path on 9p protocol level in future? Why?

> And, err, looking at the qemu code
> 
>   qidp->version = stbuf->st_mtime ^ (stbuf->st_size << 8);
> 
> so the qid is treated as "data version",
> but on kernel side we've treated it as inode version (i_version, see
> include/linux/iversion.h)
> 
> (v9fs_test_inode_dotl checks the version is the same when comparing two
> inodes) so it will incorrectly identify two identical inodes as
> different.
> That will cause problems...
> Since you'll be faster than me could you try keeping it at 0 there?

I tried with your suggested change on QEMU side:

diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
index a6d6b3f835..5d9be87758 100644
--- a/hw/9pfs/9p.c
+++ b/hw/9pfs/9p.c
@@ -981,7 +981,7 @@ static int stat_to_qid(V9fsPDU *pdu, const struct stat *stbuf, V9fsQID *qidp)
         memcpy(&qidp->path, &stbuf->st_ino, size);
     }
 
-    qidp->version = stbuf->st_mtime ^ (stbuf->st_size << 8);
+    qidp->version = 0;
     qidp->type = 0;
     if (S_ISDIR(stbuf->st_mode)) {
         qidp->type |= P9_QID_TYPE_DIR;

Unfortunately it did not make any difference for these 2 Linux kernel fs-cache
issues at least; still same errors, and same suboptimal performance.

Best regards,
Christian Schoenebeck


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ