linux-kernel - Re: [PATCH] workqueue: Always use wq_select_unbound_cpu() for WORK_CPU

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Z78s-1jHXehA33px@localhost.localdomain>
Date: Wed, 26 Feb 2025 16:02:19 +0100
From: Frederic Weisbecker <frederic@...nel.org>
To: Tejun Heo <tj@...nel.org>
Cc: Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
	open list <linux-kernel@...r.kernel.org>,
	Lai Jiangshan <jiangshanlai@...il.com>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH] workqueue: Always use wq_select_unbound_cpu() for
 WORK_CPU_UNBOUND.

Le Fri, Feb 21, 2025 at 06:48:16AM -1000, Tejun Heo a écrit :
> Hello,
> 
> On Fri, Feb 21, 2025 at 03:49:18PM +0100, Frederic Weisbecker wrote:
> > Le Fri, Feb 21, 2025 at 12:20:03PM +0100, Sebastian Andrzej Siewior a écrit :
> > > If the user did not specify a CPU while enqueuing a work item then
> > > WORK_CPU_UNBOUND is passed. In this case, for WQ_UNBOUND a CPU is
> > > selected based on wq_unbound_cpumask while the local CPU is preferred.
> > > For !WQ_UNBOUND the local CPU is selected.
> > > For NOHZ_FULL system with isolated CPU wq_unbound_cpumask is set to the
> > > not isolated (housekeeping) CPUs. This leads to different behaviour if a
> > > work item is scheduled on an isolated CPU where
> > > 	schedule_delayed_work(, 1);
> > > 
> > > will move the timer to the housekeeping CPU and then schedule the work
> > > there (on the housekeeping CPU) while
> > > 	schedule_delayed_work(, 0);
> > > 
> > > will schedule the work item on the isolated CPU.
> > > 
> > > The documentation says WQ_UNBOUND prefers the local CPU. It can
> > > preferer the local CPU if it is part of wq_unbound_cpumask.
> > > 
> > > Restrict WORK_CPU_UNBOUND to wq_unbound_cpumask via
> > > wq_select_unbound_cpu().
> > > 
> > > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
> > 
> > I really would like to have this patch in. I have considered
> > doing that a few month ago but got sort-of discouraged by the
> > lack of properly defined semantics for schedule_work(). And that
> > function has too many users to check their locality assumptions.
> > 
> > Its headers advertize to queue in global workqueue but the target
> > is system_wq and not system_unbound_wq. But then it's using
> > WORK_CPU_UNBOUND through queue_work().
> > 
> > I'm tempted to just assume that none of its users depend on the
> > work locality?
> 
> That's API guarantee and there are plenty of users who depend on
> queue_work() and schedule_work() on per-cpu workqueues to be actually
> per-cpu. I don't think we can pull the rug from under them. If we want to do
> this, which I think is a good idea, we should:
> 
> 1. Convert per-cpu workqueue users to unbound workqueues. Most users don't
>    care whether work item is executed locally or not. However, historically,
>    we've been preferring per-cpu workqueues because unbound workqueues had a
>    lot worse locality properties. Unbound workqueue's topology awareness is
>    a lot better now, so this should be less of a problem and we should be
>    able to move a lot of users over to unbound workqueues.

But we must check those ~1951 schedule_work() users one by one to make sure they
don't rely on locality for correctness, right? :-)

> 
> 2. There still are cases where local execution isn't required for
>    correctness but local & concurrency controlled executions yield
>    performance gains. Workqueue API currently doesn't distinguish these two
>    cases. We should add a new API which prefers local execution but doesn't
>    require it, which can then do what's suggested in this patch.

That is much trickier to find out and requires to know about the subsystem
details and history.

For those that don't rely on locality for correctness, we would really like
to be able to offload them to unbound pool at least when nohz_full= is filled.
Because in that case we don't care much on workqueues performance.

> 
> Unfortunately, I don't see a way forward without auditing and converting the
> users.

I see...

Thanks.

> 
> Thanks.
> 
> -- 
> tejun