lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZaHWfWYvAiolChWG@slm.duckdns.org>
Date: Fri, 12 Jan 2024 14:17:01 -1000
From: Tejun Heo <tj@...nel.org>
To: Naohiro Aota <Naohiro.Aota@....com>
Cc: "jiangshanlai@...il.com" <jiangshanlai@...il.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"kernel-team@...a.com" <kernel-team@...a.com>
Subject: Re: [PATCHSET wq/for-6.8] workqueue: Implement system-wide
 max_active for unbound workqueues

Hello,

On Thu, Jan 11, 2024 at 02:49:21PM -1000, Tejun Heo wrote:
> On Fri, Jan 05, 2024 at 02:44:08AM +0000, Naohiro Aota wrote:
> > Thank you for the series. I applied the patches on btrfs's development tree
> > below, and ran the benchmark.
> > 
> > https://gitlab.com/kdave/btrfs-devel.git misc-next
> > 
> > - misc-next, numa=off (baseline)
> >   WRITE: bw=1117MiB/s (1171MB/s), 1117MiB/s-1117MiB/s (1171MB/s-1171MB/s), io=332GiB (356GB), run=304322-304322msec
> > - misc-next + wq patches, numa=off
> >   WRITE: bw=1866MiB/s (1957MB/s), 1866MiB/s-1866MiB/s (1957MB/s-1957MB/s), io=684GiB (735GB), run=375472-375472msec
> > 
> > So, the patches surely improved the performance. However, as show below, it
> > is still lower than reverting previous workqueue patches. The reverting is
> > done by reverse applying output of "git diff 4cbfd3de737b
> > kernel/workqueue.c kernel/workqueue_internal.h include/linux/workqueue*
> > init/main.c"
> > 
> > - misc-next + wq reverted, numa=off
> >   WRITE: bw=2472MiB/s (2592MB/s), 2472MiB/s-2472MiB/s (2592MB/s-2592MB/s), io=732GiB (786GB), run=303257-303257msec
> 
> Can you describe the test setup in detail? What kind of machine is it? What
> do you mean by `numa=off`? Can you report tools/workqueue/wq_dump.py output?

So, I fixed the possible ordering bug that Lai noticed and dropped the last
patch (more on this in the reply to that path) and did some benchmarking
with fio and dm-crypt and at least in that testing the new code seems to
perform just as well as before. The only variable seems to be what
max_active is used for the workqueue in question.

For dm-crypt, kcryptd workqueue uses num_online_cpus(). Depending on how the
value is interpreted, it may not provide high enough concurrency as some
workers wait for IOs and show slightly slower performance but that's easily
fixed by bumping max_active value so that there's some buffer, which is the
right way to configure it anyway.

It'd be great if you can share more details on the benchmarks you're
running, so that we can rule out similar issues.

Thanks.

-- 
tejun

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ