linux-kernel - Re: [PATCHSET wq/for-6.8] workqueue: Implement system-wide max

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZaHWfWYvAiolChWG@slm.duckdns.org>
Date: Fri, 12 Jan 2024 14:17:01 -1000
From: Tejun Heo <tj@...nel.org>
To: Naohiro Aota <Naohiro.Aota@....com>
Cc: "jiangshanlai@...il.com" <jiangshanlai@...il.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"kernel-team@...a.com" <kernel-team@...a.com>
Subject: Re: [PATCHSET wq/for-6.8] workqueue: Implement system-wide
 max_active for unbound workqueues

Hello,

On Thu, Jan 11, 2024 at 02:49:21PM -1000, Tejun Heo wrote:
> On Fri, Jan 05, 2024 at 02:44:08AM +0000, Naohiro Aota wrote:
> > Thank you for the series. I applied the patches on btrfs's development tree
> > below, and ran the benchmark.
> > 
> > https://gitlab.com/kdave/btrfs-devel.git misc-next
> > 
> > - misc-next, numa=off (baseline)
> >   WRITE: bw=1117MiB/s (1171MB/s), 1117MiB/s-1117MiB/s (1171MB/s-1171MB/s), io=332GiB (356GB), run=304322-304322msec
> > - misc-next + wq patches, numa=off
> >   WRITE: bw=1866MiB/s (1957MB/s), 1866MiB/s-1866MiB/s (1957MB/s-1957MB/s), io=684GiB (735GB), run=375472-375472msec
> > 
> > So, the patches surely improved the performance. However, as show below, it
> > is still lower than reverting previous workqueue patches. The reverting is
> > done by reverse applying output of "git diff 4cbfd3de737b
> > kernel/workqueue.c kernel/workqueue_internal.h include/linux/workqueue*
> > init/main.c"
> > 
> > - misc-next + wq reverted, numa=off
> >   WRITE: bw=2472MiB/s (2592MB/s), 2472MiB/s-2472MiB/s (2592MB/s-2592MB/s), io=732GiB (786GB), run=303257-303257msec
> 
> Can you describe the test setup in detail? What kind of machine is it? What
> do you mean by `numa=off`? Can you report tools/workqueue/wq_dump.py output?

So, I fixed the possible ordering bug that Lai noticed and dropped the last
patch (more on this in the reply to that path) and did some benchmarking
with fio and dm-crypt and at least in that testing the new code seems to
perform just as well as before. The only variable seems to be what
max_active is used for the workqueue in question.

For dm-crypt, kcryptd workqueue uses num_online_cpus(). Depending on how the
value is interpreted, it may not provide high enough concurrency as some
workers wait for IOs and show slightly slower performance but that's easily
fixed by bumping max_active value so that there's some buffer, which is the
right way to configure it anyway.

It'd be great if you can share more details on the benchmarks you're
running, so that we can rule out similar issues.

Thanks.

-- 
tejun