lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sun, 10 Apr 2022 18:18:38 +0200
From:   Christian Schoenebeck <linux_oss@...debyte.com>
To:     asmadeus@...ewreck.org
Cc:     David Kahurani <k.kahurani@...il.com>, davem@...emloft.net,
        ericvh@...il.com, kuba@...nel.org, linux-kernel@...r.kernel.org,
        lucho@...kov.net, netdev@...r.kernel.org,
        v9fs-developer@...ts.sourceforge.net,
        David Howells <dhowells@...hat.com>, Greg Kurz <groug@...d.org>
Subject: Re: 9p fs-cache tests/benchmark (was: 9p fscache Duplicate cookie detected)

On Samstag, 9. April 2022 13:16:11 CEST Christian Schoenebeck wrote:
> On Mittwoch, 30. März 2022 14:21:16 CEST Christian Schoenebeck wrote:
> > I made some tests & benchmarks regarding the fs-cache issue of 9p, running
> > different kernel versions and kernel configurations in comparison.
> [...]
> > Case  Linux kernel version           .config  msize    cache  duration  host cpu  errors/warnings
> >
> > A)    5.17.0+[2] + msize patches[1]  debug    4186112  mmap   20m 40s   ~80%      none
> > B)    5.17.0+[2] + msize patches[1]  debug    4186112  loose  31m 28s   ~35%      several errors (compilation completed)
> > C)    5.17.0+[2] + msize patches[1]  debug    507904   mmap   20m 25s   ~84%      none
> > D)    5.17.0+[2] + msize patches[1]  debug    507904   loose  31m 2s    ~33%      several errors (compilation completed)
> > E)    5.17.0+[2]                     debug    512000   mmap   23m 45s   ~75%      none
> > F)    5.17.0+[2]                     debug    512000   loose  32m 6s    ~31%      several errors (compilation completed)
> > G)    5.17.0+[2]                     release  512000   mmap   23m 18s   ~76%      none
> > H)    5.17.0+[2]                     release  512000   loose  32m 33s   ~31%      several errors (compilation completed)
> > I)    5.17.0+[2] + msize patches[1]  release  4186112  mmap   20m 30s   ~83%      none
> > J)    5.17.0+[2] + msize patches[1]  release  4186112  loose  31m 21s   ~31%      several errors (compilation completed)
> > K)    5.10.84                        release  512000   mmap   39m 20s   ~80%      none
> > L)    5.10.84                        release  512000   loose  13m 40s   ~55%      none
> [...]
> > About the errors: I actually already see errors with cache=loose and recent
> > kernel version just when booting the guest OS. For these tests I chose some
> > sources which allowed me to complete the build to capture some benchmark as
> > well, I got some "soft" errors with those, but the build completed at least.
> > I had other sources OTOH which did not complete though and aborted with
> > certain invalid file descriptor errors, which I obviously could not use for
> > those benchmarks here.
> 
> I used git-bisect to identify the commit that broke 9p behaviour, and it is
> indeed this one:
> 
> commit eb497943fa215897f2f60fd28aa6fe52da27ca6c (HEAD, refs/bisect/bad)
> Author: David Howells <dhowells@...hat.com>
> Date:   Tue Nov 2 08:29:55 2021 +0000
> 
>     9p: Convert to using the netfs helper lib to do reads and caching
>     
>     Convert the 9p filesystem to use the netfs helper lib to handle readpage,
>     readahead and write_begin, converting those into a common issue_op for the
>     filesystem itself to handle.  The netfs helper lib also handles reading
>     from fscache if a cache is available, and interleaving reads from both
>     sources.

I looked into the errors I get, and as far as I can see it, all misbehaviours
that I see, boil down to "Bad file descriptor" (EBADF) errors being the
originating cause.

The easiest misbehaviours on the guest system I can look into, are errors
with the git client. For instance 'git fetch origin' fails this way:

...
write(3, "d16782889ee07005d1f57eb884f4a06b"..., 40) = 40
write(3, "\n", 1)                       = 1
close(3)                                = 0
access(".git/hooks/reference-transaction", X_OK) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, ".git/logs/refs/remotes/origin/master", O_WRONLY|O_CREAT|O_APPEND, 0666) = 3
openat(AT_FDCWD, "/etc/localtime", O_RDONLY|O_CLOEXEC) = 7
fstat(7, {st_mode=S_IFREG|0644, st_size=2326, ...}) = 0
fstat(7, {st_mode=S_IFREG|0644, st_size=2326, ...}) = 0
read(7, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\t\0\0\0\t\0\0\0\0"..., 8192) = 2326
lseek(7, -1467, SEEK_CUR)               = 859
read(7, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\t\0\0\0\t\0\0\0\0"..., 8192) = 1467
close(7)                                = 0
write(3, "d8a68c5027ef629d93b9d9519ff4da95"..., 168) = -1 EBADF (Bad file descriptor)
...
error: cannot update the ref 'refs/remotes/origin/master': unable to append to '.git/logs/refs/remotes/origin/master': Bad file descriptor

I tried to manually replicate those file access operations on that
.git/logs/refs/remotes/origin/master file in question, and it worked. But when
I look at the strace output above, I see there is a close(3) call just before
the subsequent openat(".git/logs/refs/remotes/origin/master") call returning 3,
which makes me wonder, is this maybe a concurrency issue on file descriptor
management?

Ideas anyone?

Best regards,
Christian Schoenebeck




Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ