lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wh0wxPx1zP1onSs88KB6zOQ0oHyOg_vGr5aK8QJ8fuxnw@mail.gmail.com>
Date:   Tue, 21 Mar 2023 11:31:35 -0700
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Tejun Heo <tj@...nel.org>
Cc:     Eric Biggers <ebiggers@...nel.org>, fsverity@...ts.linux.dev,
        linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
        "Theodore Ts'o" <tytso@....edu>,
        Nathan Huckleberry <nhuck@...gle.com>,
        Victor Hsieh <victorhsieh@...gle.com>,
        Lai Jiangshan <jiangshanlai@...il.com>
Subject: Re: [GIT PULL] fsverity fixes for v6.3-rc4

On Mon, Mar 20, 2023 at 11:05 PM Tejun Heo <tj@...nel.org> wrote:
>
> Do you remember what the other case was? Was it also on heterogenous arm
> setup?

Yup. See commit c25da5b7baf1 ("dm verity: stop using WQ_UNBOUND for verify_wq")

But see also 3fffb589b9a6 ("erofs: add per-cpu threads for
decompression as an option").

And you can see the confusion this all has in commit 43fa47cb116d ("dm
verity: remove WQ_CPU_INTENSIVE flag since using WQ_UNBOUND"), which
perhaps should be undone now.

> There aren't many differences between unbound workqueues and percpu ones
> that aren't concurrency managed. If there are significant performance
> differences, it's unlikely to be directly from whatever workqueue is doing.

There's a *lot* of special cases for WQ_UNBOUND in the workqueue code,
and they are a lot less targeted than the other WQ_xyz flags, I feel.
They have their own cpumask logic, special freeing rules etc etc.

So I would say that the "aren't many differences" is not exactly true.
There are subtle and random differences, including the very basic
"queue_work()" workflow.

Now, I assume that the arm cases don't actually use
wq_unbound_cpumask, so I assume it's mostly the "instead of local cpu
queue, use the local node queue", and so it's all on random CPU's
since nobody uses NUMA nodes.

And no, if it's caching effects, doing it on LLC boundaries isn't
rigth *either*. By default it should probably be on L2 boundaries or
something, with most non-NUMA setups likely having one single LLC but
multiple L2 nodes.

              Linus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ