[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z79CRnnNHOkxMNXD@slm.duckdns.org>
Date: Wed, 26 Feb 2025 06:33:10 -1000
From: Tejun Heo <tj@...nel.org>
To: Frederic Weisbecker <frederic@...nel.org>
Cc: Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
open list <linux-kernel@...r.kernel.org>,
Lai Jiangshan <jiangshanlai@...il.com>,
Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH] workqueue: Always use wq_select_unbound_cpu() for
WORK_CPU_UNBOUND.
Hello, Frederic.
On Wed, Feb 26, 2025 at 04:02:19PM +0100, Frederic Weisbecker wrote:
...
> > That's API guarantee and there are plenty of users who depend on
> > queue_work() and schedule_work() on per-cpu workqueues to be actually
> > per-cpu. I don't think we can pull the rug from under them. If we want to do
> > this, which I think is a good idea, we should:
> >
> > 1. Convert per-cpu workqueue users to unbound workqueues. Most users don't
> > care whether work item is executed locally or not. However, historically,
> > we've been preferring per-cpu workqueues because unbound workqueues had a
> > lot worse locality properties. Unbound workqueue's topology awareness is
> > a lot better now, so this should be less of a problem and we should be
> > able to move a lot of users over to unbound workqueues.
>
> But we must check those ~1951 schedule_work() users one by one to make sure they
> don't rely on locality for correctness, right? :-)
Yes, no matter what we do, there is no way around that.
> > 2. There still are cases where local execution isn't required for
> > correctness but local & concurrency controlled executions yield
> > performance gains. Workqueue API currently doesn't distinguish these two
> > cases. We should add a new API which prefers local execution but doesn't
> > require it, which can then do what's suggested in this patch.
>
> That is much trickier to find out and requires to know about the subsystem
> details and history.
One good thing is that for workqueues that actually should be per-CPU for
performance, there usually are a group of people, often including the
mtaintainers, that would be familiar with the performance situation and pipe
up, so it's not *that* hopeless.
> For those that don't rely on locality for correctness, we would really like
> to be able to offload them to unbound pool at least when nohz_full= is filled.
> Because in that case we don't care much on workqueues performance.
Yeah, that makes sense to me.
Thanks.
--
tejun
Powered by blists - more mailing lists