linux-kernel - Re: [PATCH] workqueue: Always use wq_select_unbound_cpu() for WORK_CPU

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Z79CRnnNHOkxMNXD@slm.duckdns.org>
Date: Wed, 26 Feb 2025 06:33:10 -1000
From: Tejun Heo <tj@...nel.org>
To: Frederic Weisbecker <frederic@...nel.org>
Cc: Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
	open list <linux-kernel@...r.kernel.org>,
	Lai Jiangshan <jiangshanlai@...il.com>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH] workqueue: Always use wq_select_unbound_cpu() for
 WORK_CPU_UNBOUND.

Hello, Frederic.

On Wed, Feb 26, 2025 at 04:02:19PM +0100, Frederic Weisbecker wrote:
...
> > That's API guarantee and there are plenty of users who depend on
> > queue_work() and schedule_work() on per-cpu workqueues to be actually
> > per-cpu. I don't think we can pull the rug from under them. If we want to do
> > this, which I think is a good idea, we should:
> > 
> > 1. Convert per-cpu workqueue users to unbound workqueues. Most users don't
> >    care whether work item is executed locally or not. However, historically,
> >    we've been preferring per-cpu workqueues because unbound workqueues had a
> >    lot worse locality properties. Unbound workqueue's topology awareness is
> >    a lot better now, so this should be less of a problem and we should be
> >    able to move a lot of users over to unbound workqueues.
> 
> But we must check those ~1951 schedule_work() users one by one to make sure they
> don't rely on locality for correctness, right? :-)

Yes, no matter what we do, there is no way around that.

> > 2. There still are cases where local execution isn't required for
> >    correctness but local & concurrency controlled executions yield
> >    performance gains. Workqueue API currently doesn't distinguish these two
> >    cases. We should add a new API which prefers local execution but doesn't
> >    require it, which can then do what's suggested in this patch.
> 
> That is much trickier to find out and requires to know about the subsystem
> details and history.

One good thing is that for workqueues that actually should be per-CPU for
performance, there usually are a group of people, often including the
mtaintainers, that would be familiar with the performance situation and pipe
up, so it's not *that* hopeless.

> For those that don't rely on locality for correctness, we would really like
> to be able to offload them to unbound pool at least when nohz_full= is filled.
> Because in that case we don't care much on workqueues performance.

Yeah, that makes sense to me.

Thanks.

-- 
tejun