linux-kernel - Re: 9p fs-cache tests/benchmark (was: 9p fscache Duplicate cookie detected)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YkTP/Talsy3KQBbf@codewreck.org>
Date:   Thu, 31 Mar 2022 06:47:41 +0900
From:   asmadeus@...ewreck.org
To:     Christian Schoenebeck <linux_oss@...debyte.com>
Cc:     David Kahurani <k.kahurani@...il.com>, davem@...emloft.net,
        ericvh@...il.com, kuba@...nel.org, linux-kernel@...r.kernel.org,
        lucho@...kov.net, netdev@...r.kernel.org,
        v9fs-developer@...ts.sourceforge.net,
        David Howells <dhowells@...hat.com>, Greg Kurz <groug@...d.org>
Subject: Re: 9p fs-cache tests/benchmark (was: 9p fscache Duplicate cookie
 detected)

Thanks Christian!

Christian Schoenebeck wrote on Wed, Mar 30, 2022 at 02:21:16PM +0200:
> Case  Linux kernel version           .config  msize    cache  duration  host cpu  errors/warnings
> 
> A)    5.17.0+[2] + msize patches[1]  debug    4186112  mmap   20m 40s   ~80%      none
> B)    5.17.0+[2] + msize patches[1]  debug    4186112  loose  31m 28s   ~35%      several errors (compilation completed)
> C)    5.17.0+[2] + msize patches[1]  debug    507904   mmap   20m 25s   ~84%      none
> D)    5.17.0+[2] + msize patches[1]  debug    507904   loose  31m 2s    ~33%      several errors (compilation completed)
> E)    5.17.0+[2]                     debug    512000   mmap   23m 45s   ~75%      none
> F)    5.17.0+[2]                     debug    512000   loose  32m 6s    ~31%      several errors (compilation completed)
> G)    5.17.0+[2]                     release  512000   mmap   23m 18s   ~76%      none
> H)    5.17.0+[2]                     release  512000   loose  32m 33s   ~31%      several errors (compilation completed)
> I)    5.17.0+[2] + msize patches[1]  release  4186112  mmap   20m 30s   ~83%      none
> J)    5.17.0+[2] + msize patches[1]  release  4186112  loose  31m 21s   ~31%      several errors (compilation completed)
> K)    5.10.84                        release  512000   mmap   39m 20s   ~80%      none
> L)    5.10.84                        release  512000   loose  13m 40s   ~55%      none

ow.

> Disclaimer: I have not looked into the fs-cache sources yet, so I am not sure,
> but my first impression is that probably something got broken with recent
> fs-cache changes (see column errors, especially in comparison to case L) which
> did not generate any errors)? And also note the huge build duration 
> differences, especially in comparison to case L), so fs-cache (cache=loose)
> also got significantly slower while cache=mmap OTOH became significantly
> faster?

Yes, that's a big regression; I didn't do any performance benchmark with
the new patches as I didn't think it'd matter but I obviously should
have.

There is one thing I must check: I know new kernels will be writing in
4k chunks and that is going to be very slow until the netfs write
helpers are finished, but I thought the old code did the same.
If the old code had bigger writes that performance will probably come
back.
Otherwise there's some other error like not reusing cached content we
should use.


> About the errors: I actually already see errors with cache=loose and recent
> kernel version just when booting the guest OS. For these tests I chose some
> sources which allowed me to complete the build to capture some benchmark as
> well, I got some "soft" errors with those, but the build completed at least.
> I had other sources OTOH which did not complete though and aborted with
> certain invalid file descriptor errors, which I obviously could not use for
> those benchmarks here.

That's less surprising, the change was really huge. I'm annoyed because
I did test part of a parallel linux kernel compilation with
cache=fscache without noticing a problem :/

I'll try to reproduce this weekend-ish.
> > Christian Schoenebeck wrote on Sat, Mar 26, 2022 at 01:36:31PM +0100:
> > hm, fscache code shouldn't be used for cache=mmap, I'm surprised you can
> > hit this...
> 
> I assume that you mean that 9p driver does not explicitly ask for fs-cache
> being used for mmap. I see that 9p uses the kernel's generalized mmap
> implementation:
> 
> https://github.com/torvalds/linux/blob/d888c83fcec75194a8a48ccd283953bdba7b2550/fs/9p/vfs_file.c#L481
> 
> I haven't dived further into this, but the kernel has to use some kind of
> filesystem cache anyway to provide the mmap functionality, so I guess it makes
> sense that I got those warning messages from the FS-Cache subsystem?

It uses the generic mmap which has vfs caching, but definitely not
fs-cache.
fs-cache adds more hooks for cachefilesd (writing file contents to disk
for bigger cache) and things like that cache=loose/mmap shouldn't be
caring about. cache=loose actually just disables some key parts so I'm
not surprised it shares bugs with the new code, but cache=mmap is really
independant and I need to trace where these come from...

> With QEMU >= 5.2 you should see the following QEMU warning with your reproducer:
> 
> "
> qemu-system-x86_64: warning: 9p: Multiple devices detected in same VirtFS
> export, which might lead to file ID collisions and severe misbehaviours on
> guest! You should either use a separate export for each device shared from
> host or use virtfs option 'multidevs=remap'!
> "

oh, I wasn't aware of the new option. Good job there!

It's the easiest way to reproduce but there are also harder to fix
collisions, file systems only guarantee unicity for (fsid,inode
number,version) which is usually bigger than 128 bits (although version
is often 0), but version isn't exposed to userspace easily...
What we'd want for unicity is handle from e.g. name_to_handle_at but
that'd add overhead, wouldn't fit in qid path and not all fs are capable
of providing one... The 9p protocol just doesn't want bigger handles
than qid path.



And, err, looking at the qemu code

  qidp->version = stbuf->st_mtime ^ (stbuf->st_size << 8);

so the qid is treated as "data version",
but on kernel side we've treated it as inode version (i_version, see
include/linux/iversion.h)

(v9fs_test_inode_dotl checks the version is the same when comparing two
inodes) so it will incorrectly identify two identical inodes as
different.
That will cause problems...
Since you'll be faster than me could you try keeping it at 0 there?

I see fscache also uses the qid version as 'auxilliary data', but I'm
not sure what this is used for -- if it's a data version like thing then
it will also at least invalidate the cache content all the time.


Note there also is a data_version thing in the protocol in the response
to getattr, which the protocol side of 9p in linux digilently fills in
st_data_version, but we never use it that I can see.
This is probably what 9p meant to fill, and fscache should rely on to
detect file changes if that helps.


I'm sorry I didn't see this sooner....

> > If you have some kind of reproducer of invalid filedescriptor or similar
> > errors I'd be happy to dig a bit more, I don't particularly like all
> > aspect of our cache model but it's not good if it corrupts things.
> 
> Maybe you can reproduce this with the root fs setup [4] described above? As I
> said, I immediately get errors when guest OS is booting. So I don't have to
> run something fancy to get errors with cache=loose & recent kernel.

Yes, this is much worse than I had first assumed when you first brought
it up, I'll definitely set some time aside to investigate.

-- 
Dominique