[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20221004150521.822266-1-vschneid@redhat.com>
Date: Tue, 4 Oct 2022 16:05:17 +0100
From: Valentin Schneider <vschneid@...hat.com>
To: linux-kernel@...r.kernel.org
Cc: Tejun Heo <tj@...nel.org>, Lai Jiangshan <jiangshanlai@...il.com>,
Peter Zijlstra <peterz@...radead.org>,
Frederic Weisbecker <frederic@...nel.org>,
Juri Lelli <juri.lelli@...hat.com>,
Phil Auld <pauld@...hat.com>,
Marcelo Tosatti <mtosatti@...hat.com>
Subject: [PATCH v4 0/4] workqueue: destroy_worker() vs isolated CPUs
Hi folks,
I haven't sent an update for this in a while, but the issue has risen again in
some other environment so I'm getting more reasons to push this out.
Revisions
=========
RFCv3 -> v4
+++++++++++
o Rebase onto v6.0
o Split into more patches for reviewability
o Take dying workers out of the pool->workers as suggested by Lai
RFCv2 -> RFCv3
++++++++++++++
o Rebase onto v5.19
o Add new patch (1/3) around accessing wq_unbound_cpumask
o Prevent WORKER_DIE workers for kfree()'ing themselves before the idle reaper
gets to handle them (Tejun)
Bit of an aside on that: I've been struggling to convince myself this can
happen due to spurious wakeups and would like some help here.
Idle workers are TASK_UNINTERRUPTIBLE, so they can't be woken up by
signals. That state is set *under* pool->lock, and all wakeups (before this
patch) are also done while holding pool->lock.
wake_up_worker() is done under pool->lock AND only wakes a worker on the
pool->idle_list. Thus the to-be-woken worker *cannot* have WORKER_DIE, though
it could gain it *after* being woken but *before* it runs, e.g.:
LOCK pool->lock
wake_up_worker(pool)
wake_up_process(p)
UNLOCK pool->lock
idle_reaper_fn()
LOCK pool->lock
destroy_worker(worker, list);
UNLOCK pool->lock
worker_thread()
goto woke_up;
LOCK pool->lock
READ worker->flags & WORKER_DIE
UNLOCK pool->lock
...
kfree(worker);
reap_worker(worker);
// Uh-oh
... But IMO that's not a spurious wakeup, that's a concurrency issue. I don't
see any spurious/unexpected worker wakeup happening once a worker is off the
pool->idle_list.
RFCv1 -> RFCv2
++++++++++++++
o Change the pool->timer into a delayed_work to have a sleepable context for
unbinding kworkers
Cheers,
Valentin
Lai Jiangshan (1):
workqueue: Protects wq_unbound_cpumask with wq_pool_attach_mutex
Valentin Schneider (3):
workqueue: Factorize unbind/rebind_workers() logic
workqueue: Convert the idle_timer to a delayed_work
workqueue: Unbind workers before sending them to exit()
kernel/workqueue.c | 195 +++++++++++++++++++++++++++++++--------------
1 file changed, 136 insertions(+), 59 deletions(-)
--
2.31.1
Powered by blists - more mailing lists