lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 20 Mar 2023 16:16:38 -0700
From:   Eric Biggers <ebiggers@...nel.org>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Tejun Heo <tj@...nel.org>, fsverity@...ts.linux.dev,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
        Theodore Ts'o <tytso@....edu>,
        Nathan Huckleberry <nhuck@...gle.com>,
        Victor Hsieh <victorhsieh@...gle.com>
Subject: Re: [GIT PULL] fsverity fixes for v6.3-rc4

On Mon, Mar 20, 2023 at 03:31:13PM -0700, Linus Torvalds wrote:
> On Mon, Mar 20, 2023 at 2:07 PM Eric Biggers <ebiggers@...nel.org> wrote:
> >
> > Nathan Huckleberry (1):
> >       fsverity: Remove WQ_UNBOUND from fsverity read workqueue
> 
> There's a *lot* of other WQ_UNBOUND users. If it performs that badly,
> maybe there is something wrong with the workqueue code.
> 
> Should people be warned to not use WQ_UNBOUND - or is there something
> very special about fsverity?
> 
> Added Tejun to the cc. With one of the main documented reasons for
> WQ_UNBOUND being performance (both implicit "try to start execution of
> work items as soon as possible") and explicit ("CPU intensive
> workloads which can be better managed by the system scheduler"), maybe
> it's time to reconsider?
> 
> WQ_UNBOUND adds a fair amount of complexity and special cases to the
> workqueues, and this is now the second "let's remove it because it's
> hurting things in a big way".
> 
>               Linus

So, Nathan has been doing most of the investigation and testing on this, and
he's out of office at the moment.

But, my understanding is that since modern CPUs have acceleration for all the
common crypto algorithms (including fsverity's SHA-256), the work items just
don't take long enough for the overhead of a context switch to be worth it.
WQ_UNBOUND seems to be optimized for much longer running work items.

Additionally, the WQ_UNBOUND overhead is particularly bad on arm64.  We aren't
sure of the reason for this.  Nathan thinks this is probably related to overhead
of saving/restoring the FPU+SIMD state.  My theory is that it's mainly caused by
heterogeneous processors, where work that would ordinarily run on the fastest
CPU core gets scheduled on a slow CPU core.  Maybe it's a combination of both.

WQ_UNBOUND has been shown to be detrimental to EROFS decompression and to
dm-verity too, so this isn't specific to fsverity.  (fscrypt is still under
investigation.  I'd guess the same applies, but it's been less of a priority
since fscrypt doesn't use a workqueue when inline encryption is being used.)

These are all "I/O post-processing cases", though, so all sort of similar.

- Eric

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ