lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1675519496.NcNzUn7KHO@silver>
Date:   Sat, 04 Feb 2023 14:40:42 +0100
From:   Christian Schoenebeck <linux_oss@...debyte.com>
To:     Eric Van Hensbergen <ericvh@...il.com>
Cc:     v9fs-developer@...ts.sourceforge.net, asmadeus@...ewreck.org,
        rminnich@...il.com, lucho@...kov.net,
        Eric Van Hensbergen <ericvh@...nel.org>,
        linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH v3 00/11] Performance fixes for 9p filesystem

On Friday, February 3, 2023 8:12:14 PM CET Eric Van Hensbergen wrote:
> Hi Christian, thanks for the feedback -- will dig in and see if I can
> find what's gone south here.  Clearly my approach to writeback without
> writeback_fid didn't cover all the corner cases and thats the cause of
> the fault.  Can I get a better idea of how to reproduce - you booted
> with a root 9p file system, and then tried to build...what?

KDE, which builds numerous packages, multi-threaded by default. In the past we
had 9p issues which triggered only after hours of compiling, however in this
case I don't think that you need to build something fancy. Because it already
fails at the very beginning of any build process, just when detecting a
compiler.

May I ask what kind of scenario you have tested so far? It was not a multi-
threaded context, right? Large chunk or small chunk I/O?

> Performance degradation is interesting, runs counter to the
> unit-testing and benchmarking I did, but I didn't do something as
> logical as a build to check -- need to tease apart whether this is a
> read problem, a write problem...or both.  My intuition is that its on
> the write side, but as part of going through the code I made the cache
> code a lot more pessimistic so its possible I inadvertently killed an
> optimistic optimization.

I have not walked down the road to investigate individual I/O errors or even
their cause yet, but from my feeling it could also be related to fid vs.
writeback_fid. I saw you dropped a fix we made there last year, but haven't
checked yet whether your changes would handle it correctly in another way.

> Finally, just to clarify, the panic you had at the end happened with
> readahead?  Seems interesting because clearly it thought it was
> writing back something that it shouldn't have been writing back (since
> writeback caches weren't enabled).   I'm thinking something was marked
> as dirty even though the underlying system just wrote-through the
> change and so the writeback isn't actually required.  This may also be
> an indicator of the performance issue if we are actually writing
> through the data in addition to an unnecessary write-back (which I
> also worry is writing back bad data in the second case).

It was not a kernel panic. It's a warning that appears right after boot, but
the system continues to run. So that warning is printed before starting the
actual build process. And yes, the warning is printed with "readahead".

> Can you give me an idea of what the other misbehaviors were?

There were really all sorts of misbheaviour on application level, e.g. no
command history being available from shell (arrow up/down), things hanging on
the shell for a long time, error messages. And after the writeahead test the
build directory was screwed, i.e. even after rebooting with a regular kernel
things no longer built correctly, so I had to restore a snapshot.

Best regards,
Christian Schoenebeck


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ