[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2551609.RCmPuZc3Qn@silver>
Date: Thu, 14 Apr 2022 14:44:53 +0200
From: Christian Schoenebeck <linux_oss@...debyte.com>
To: asmadeus@...ewreck.org
Cc: David Kahurani <k.kahurani@...il.com>, davem@...emloft.net,
ericvh@...il.com, kuba@...nel.org, linux-kernel@...r.kernel.org,
lucho@...kov.net, netdev@...r.kernel.org,
v9fs-developer@...ts.sourceforge.net,
David Howells <dhowells@...hat.com>, Greg Kurz <groug@...d.org>
Subject: Re: 9p fs-cache tests/benchmark (was: 9p fscache Duplicate cookie detected)
On Mittwoch, 13. April 2022 00:38:21 CEST asmadeus@...ewreck.org wrote:
> Christian Schoenebeck wrote on Mon, Apr 11, 2022 at 03:41:45PM +0200:
> > I get more convinced that it's a bug on Linux kernel side. When guest is
> > booted I always immediately get a read("/var/log/wtmp") = EBADF error on
> > guest. And the 9p command sequence sent to QEMU 9p server were:
>
> Yes, I'm not pointing fingers, just trying to understand :)
Don't worry, that was not my impression, nor was it my intention either. I'm
jut trying to interpret what I'm seeing here.
> > ...
> > v9fs_clunk tag 0 id 120 fid 568
> > v9fs_walk tag 0 id 110 fid 1 newfid 568 nwnames 1
> > v9fs_rerror tag 0 id 110 err 2
> > v9fs_walk tag 0 id 110 fid 26 newfid 568 nwnames 1
> > v9fs_rerror tag 0 id 110 err 2
> > v9fs_readlink tag 0 id 22 fid 474
> > v9fs_readlink_return tag 0 id 22 name /run
> > v9fs_readlink tag 0 id 22 fid 474
> > v9fs_readlink_return tag 0 id 22 name /run
> > v9fs_readlink tag 0 id 22 fid 474
> > v9fs_readlink_return tag 0 id 22 name /run
> > v9fs_readlink tag 0 id 22 fid 474
> > v9fs_readlink_return tag 0 id 22 name /run
> > v9fs_walk tag 0 id 110 fid 633 newfid 568 nwnames 1
> > v9fs_rerror tag 0 id 110 err 2
> > v9fs_walk tag 0 id 110 fid 875 newfid 568 nwnames 0
> > v9fs_walk_return tag 0 id 110 nwnames 0 qids (nil)
> > v9fs_open tag 0 id 12 fid 568 mode 32769
> > v9fs_open_return tag 0 id 12 qid={type 0 version 0 path 820297} iounit
> > 507904 v9fs_walk tag 0 id 110 fid 875 newfid 900 nwnames 0
> > v9fs_walk_return tag 0 id 110 nwnames 0 qids (nil)
> > v9fs_open tag 0 id 12 fid 900 mode 2
> > v9fs_open_return tag 0 id 12 qid={type 0 version 0 path 820297} iounit
> > 507904 v9fs_lock tag 0 id 52 fid 568 type 1 start 0 length 0
> > v9fs_lock_return tag 0 id 52 status 0
> > v9fs_xattrwalk tag 0 id 30 fid 568 newfid 901 name security.capability
> > v9fs_rerror tag 0 id 30 err 95
> > v9fs_read tag 0 id 116 fid 568 off 192512 max_count 256
> >
> > So guest opens /var/log/wtmp with fid=568 mode=32769, which is write-only
> > mode, and then it tries to read that fid 568, which eventually causes the
> > read() call on host to error with EBADF. Which makes sense, as the file
> > was
> > opened in write-only mode, hence read() is not possible with that file
> > descriptor.
>
> Oh! That's something we can work on. the vfs code has different caches
> for read only and read-write fids, perhaps the new netfs code just used
> the wrong one somewhere. I'll have a look.
>
> > The other things I noticed when looking at the 9p command sequence above:
> > there is a Twalk on fid 568 before, which is not clunked before reusing
> > fid
> > 568 with Topen later. And before that Twalk on fid 568 there is a Tclunk
> > on
> > fid 568, but apparently that fid was not used before.
>
> This one though is just weird, I don't see where linux would make up a fid
> to clunk like this... Could messages be ordered a bit weird through
> multithreading?
> e.g. thread 1 opens, thread 2 clunks almost immediately afterwards, and
> would be printed the other way around?
Yeah, something like that was also my guess.
> Should still be serialized through the virtio ring buffer so I don't
> believe what I'm saying myself... It might be worth digging further as
> well.
>
> > > Perhaps backing filesystem dependant? qemu version? virtfs access
> > > options?
> >
> > I tried with different hardware and different file systems (ext4, btrfs),
> > same misbehaviours.
> >
> > QEMU is latest git version. I also tried several different QEMU versions,
> > same thing.
> >
> > QEMU command line used:
> >
> > ~/git/qemu/build/qemu-system-x86_64 \
> > -machine pc,accel=kvm,usb=off,dump-guest-core=off -m 16384 \
> > -smp 8,sockets=8,cores=1,threads=1 -rtc base=utc -boot strict=on \
> > -kernel ~/vm/bullseye/boot/vmlinuz \
> > -initrd ~/vm/bullseye/boot/initrd.img \
> > -append 'root=fsRoot rw rootfstype=9p
> > rootflags=trans=virtio,version=9p2000.L,msize=4186112,cache=loose
> > console=ttyS0' \ -fsdev
> > local,security_model=mapped,multidevs=remap,id=fsdev-fs0,path=$HOME/vm/bu
> > llseye/ \ -device virtio-9p-pci,id=fs0,fsdev=fsdev-fs0,mount_tag=fsRoot \
> > -sandbox
> > on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
> > -nographic
> >
> > Important for reproducing this issue:
> > * cache=loose
> > * -smp N (with N>1)
> > * Guest booted with Linux kernel containing commit eb497943fa21
> >
> > (uname >= 5.16)
> >
> > I'm pretty sure that you can reproduce this issue with the QEMU 9p rootfs
> > setup HOWTO linked before.
>
> Yes, I'm not sure why I can't reproduce... All my computers are pretty
> slow but the conditions should be met.
> I'll try again with a command line closer to what you just gave here.
I'm not surprised that you could not reproduce the EBADF errors yet. To make
this more clear, as for the git client errors: I have like 200+ git
repositories checked out on that test VM, and only about 5 of them trigger
EBADF errors on 'git pull'. But those few repositories reproduce the EBADF
errors reliably here.
In other words: these EBADF errors only seem to trigger under certain
circumstances, so it requires quite a bunch of test material to get a
reproducer.
Like I said though, with the Bullseye installation I immediately get EBADF
errors already when booting, whereas with a Buster VM it boots without errors.
> > > It's all extremely slow though... like the final checkout counting files
> > > at less than 10/s
> >
> > It is VERY slow. And the weird thing is that cache=loose got much slower
> > than cache=mmap. My worst case expactation would be cache=loose at least
> > not performing worse than cache=mmap.
>
> Yes, some profiling is also in order, it didn't use to be that slow so
> it must not be reusing previously open fids as it should have or
> something..
If somebody has some more ideas what I can try/test, let me know. However ATM
I won't be able to review the netfs and vfs code to actually find the cause of
these issues.
Best regards,
Christian Schoenebeck
Powered by blists - more mailing lists