[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aL8H2VtN2dw1a8B+@devvm11784.nha0.facebook.com>
Date: Mon, 8 Sep 2025 09:44:09 -0700
From: Bobby Eshleman <bobbyeshleman@...il.com>
To: Marco Crivellari <marco.crivellari@...e.com>
Cc: linux-kernel@...r.kernel.org, netdev@...r.kernel.org,
Tejun Heo <tj@...nel.org>, Lai Jiangshan <jiangshanlai@...il.com>,
Frederic Weisbecker <frederic@...nel.org>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
Michal Hocko <mhocko@...e.com>,
"David S . Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>
Subject: Re: [PATCH net-next 3/3] net: WQ_PERCPU added to alloc_workqueue
users
On Fri, Sep 05, 2025 at 11:05:05AM +0200, Marco Crivellari wrote:
> Currently if a user enqueue a work item using schedule_delayed_work() the
> used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
> WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
> schedule_work() that is using system_wq and queue_work(), that makes use
> again of WORK_CPU_UNBOUND.
> This lack of consistentcy cannot be addressed without refactoring the API.
>
> alloc_workqueue() treats all queues as per-CPU by default, while unbound
> workqueues must opt-in via WQ_UNBOUND.
>
> This default is suboptimal: most workloads benefit from unbound queues,
> allowing the scheduler to place worker threads where they’re needed and
> reducing noise when CPUs are isolated.
>
> This patch adds a new WQ_PERCPU flag at the network subsystem, to explicitly
> request the use of the per-CPU behavior. Both flags coexist for one release
> cycle to allow callers to transition their calls.
>
> Once migration is complete, WQ_UNBOUND can be removed and unbound will
> become the implicit default.
>
> With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
> any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
> must now use WQ_PERCPU.
>
> All existing users have been updated accordingly.
>
> Suggested-by: Tejun Heo <tj@...nel.org>
> Signed-off-by: Marco Crivellari <marco.crivellari@...e.com>
[...]
> diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
> index f0e48e6911fc..b3e960108e6b 100644
> --- a/net/vmw_vsock/virtio_transport.c
> +++ b/net/vmw_vsock/virtio_transport.c
> @@ -916,7 +916,7 @@ static int __init virtio_vsock_init(void)
> {
> int ret;
>
> - virtio_vsock_workqueue = alloc_workqueue("virtio_vsock", 0, 0);
> + virtio_vsock_workqueue = alloc_workqueue("virtio_vsock", WQ_PERCPU, 0);
> if (!virtio_vsock_workqueue)
> return -ENOMEM;
>
> diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c
> index 6e78927a598e..bc2ff918b315 100644
> --- a/net/vmw_vsock/vsock_loopback.c
> +++ b/net/vmw_vsock/vsock_loopback.c
> @@ -139,7 +139,7 @@ static int __init vsock_loopback_init(void)
> struct vsock_loopback *vsock = &the_vsock_loopback;
> int ret;
>
> - vsock->workqueue = alloc_workqueue("vsock-loopback", 0, 0);
> + vsock->workqueue = alloc_workqueue("vsock-loopback", WQ_PERCPU, 0);
> if (!vsock->workqueue)
> return -ENOMEM;
>
LGTM for the vmw_vsock bits. Regarding step 2 "Check who really needs to
be per-cpu", IIRC a few years ago I did some playing around with per-cpu
wq for vsock and I don't think I saw a huge difference in performance,
so I'd expect it to be in the "not really needs per-cpu" camp... I might
be able to help re-evaluate that when the time comes.
Reviewed-by: Bobby Eshleman <bobbyeshleman@...a.com>
Powered by blists - more mailing lists