lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YlNgN5f1KnT1walD@codewreck.org>
Date:   Mon, 11 Apr 2022 07:54:47 +0900
From:   asmadeus@...ewreck.org
To:     Christian Schoenebeck <linux_oss@...debyte.com>
Cc:     David Kahurani <k.kahurani@...il.com>, davem@...emloft.net,
        ericvh@...il.com, kuba@...nel.org, linux-kernel@...r.kernel.org,
        lucho@...kov.net, netdev@...r.kernel.org,
        v9fs-developer@...ts.sourceforge.net,
        David Howells <dhowells@...hat.com>, Greg Kurz <groug@...d.org>
Subject: Re: 9p fs-cache tests/benchmark (was: 9p fscache Duplicate cookie
 detected)

Thanks for keeping it up!

Christian Schoenebeck wrote on Sun, Apr 10, 2022 at 06:18:38PM +0200:
> > I used git-bisect to identify the commit that broke 9p behaviour, and it is
> > indeed this one:
> > 
> > commit eb497943fa215897f2f60fd28aa6fe52da27ca6c (HEAD, refs/bisect/bad)
> > Author: David Howells <dhowells@...hat.com>
> > Date:   Tue Nov 2 08:29:55 2021 +0000
> > 
> >     9p: Convert to using the netfs helper lib to do reads and caching

Yes, quite a few things changed with that.

> I looked into the errors I get, and as far as I can see it, all misbehaviours
> that I see, boil down to "Bad file descriptor" (EBADF) errors being the
> originating cause.
> 
> The easiest misbehaviours on the guest system I can look into, are errors
> with the git client. For instance 'git fetch origin' fails this way:

FWIW I didn't report but did try to reproduce, on my machines (tried a
couple) booting on a small alpine rootfs over 9p works, and I tried some
git clone/git fetch of variying sizes of local repo (tmpfs in VM -> 9p
mount of tmpfs on host) to no avail.
Perhaps backing filesystem dependant? qemu version? virtfs access options?

It's all extremely slow though... like the final checkout counting files
at less than 10/s

> ...
> write(3, "d16782889ee07005d1f57eb884f4a06b"..., 40) = 40
> write(3, "\n", 1)                       = 1
> close(3)                                = 0
> access(".git/hooks/reference-transaction", X_OK) = -1 ENOENT (No such file or directory)
> openat(AT_FDCWD, ".git/logs/refs/remotes/origin/master", O_WRONLY|O_CREAT|O_APPEND, 0666) = 3
> openat(AT_FDCWD, "/etc/localtime", O_RDONLY|O_CLOEXEC) = 7
> fstat(7, {st_mode=S_IFREG|0644, st_size=2326, ...}) = 0
> fstat(7, {st_mode=S_IFREG|0644, st_size=2326, ...}) = 0
> read(7, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\t\0\0\0\t\0\0\0\0"..., 8192) = 2326
> lseek(7, -1467, SEEK_CUR)               = 859
> read(7, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\t\0\0\0\t\0\0\0\0"..., 8192) = 1467
> close(7)                                = 0
> write(3, "d8a68c5027ef629d93b9d9519ff4da95"..., 168) = -1 EBADF (Bad file descriptor)
> ...
> error: cannot update the ref 'refs/remotes/origin/master': unable to append to '.git/logs/refs/remotes/origin/master': Bad file descriptor
> 
> I tried to manually replicate those file access operations on that
> .git/logs/refs/remotes/origin/master file in question, and it worked. But when
> I look at the strace output above, I see there is a close(3) call just before
> the subsequent openat(".git/logs/refs/remotes/origin/master") call returning 3,
> which makes me wonder, is this maybe a concurrency issue on file descriptor
> management?

hmm, in cache=loose case write should just be updating the page cache
for buffers to be flushed later, so this is definitely weird.

If you can reproduce well enough for this, could you first confirm that
the EBADF comes from the client and not qemu? either mounting with debug
or getting traces from qemu at a protocol level would get that.

If it's local there are only so many places EBADF can come from and it
should be possible to trace it back with e.g. perf probe or bpftrace,
but even if we confirm that e.g. the process' fd table is messed up it
won't tell us why it was, so it's going to be annoying... I'd really
like to be able to reproduce this somehow :/

-- 
Dominique

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ