[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20220802084146.3922640-1-vschneid@redhat.com>
Date:   Tue,  2 Aug 2022 09:41:43 +0100
From:   Valentin Schneider <vschneid@...hat.com>
To:     linux-kernel@...r.kernel.org
Cc:     Tejun Heo <tj@...nel.org>, Lai Jiangshan <jiangshanlai@...il.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Frederic Weisbecker <frederic@...nel.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Phil Auld <pauld@...hat.com>,
        Marcelo Tosatti <mtosatti@...hat.com>
Subject: [RFC PATCH v3 0/3] workqueue: destroy_worker() vs isolated CPUs
Hi folks,
Using a work struct from within the workqueue code itself is a bit scary, but
it seems to be holding up (at the very least on the locking side of things).
Note that this affects all kworkers (not just percpu ones) for the sake of
consistency and to prevent adding extra corner cases. kthread_set_per_cpu(p, -1)
is a no-op for unbound kworkers, and IIUC the affinity change is not required
since unbound workers have to be affined to a subset of wq_unbound_cpumask, but
it shouldn't be harmful either.
3/3 (not for merging!) is a simple and stupid stresser that forces extra pcpu
kworkers to be spawned on a specific CPU - I can then quickly test this on QEMU
by making sure said CPU is isolated on the cmdline.
Thanks to Tejun & Lai for the discussion thus far.
Revisions
=========
RFCv2 -> RFCv3
++++++++++++++
o Rebase onto v5.19
o Add new patch (1/3) around accessing wq_unbound_cpumask
o Prevent WORKER_DIE workers for kfree()'ing themselves before the idle reaper
  gets to handle them (Tejun)
  Bit of an aside on that: I've been struggling to convince myself this can
  happen due to spurious wakeups and would like some help here.
  Idle workers are TASK_UNINTERRUPTIBLE, so they can't be woken up by
  signals. That state is set *under* pool->lock, and all wakeups (before this
  patch) are also done while holding pool->lock.
  
  wake_up_worker() is done under pool->lock AND only wakes a worker on the
  pool->idle_list. Thus the to-be-woken worker *cannot* have WORKER_DIE, though
  it could gain it *after* being woken but *before* it runs, e.g.:
                          
  LOCK pool->lock
  wake_up_worker(pool)
      wake_up_process(p)
  UNLOCK pool->lock
                          idle_reaper_fn()
                            LOCK pool->lock
                            destroy_worker(worker, list);
			    UNLOCK pool->lock
			                            worker_thread()
						      goto woke_up;
                                                      LOCK pool->lock
						      READ worker->flags & WORKER_DIE
                                                          UNLOCK pool->lock
                                                          ...
						          kfree(worker);
                            reap_worker(worker);
			        // Uh-oh
			  
  ... But IMO that's not a spurious wakeup, that's a concurrency issue. I don't
  see any spurious/unexpected worker wakeup happening once a worker is off the
  pool->idle_list.
  
RFCv1 -> RFCv2
++++++++++++++
o Change the pool->timer into a delayed_work to have a sleepable context for
  unbinding kworkers
Cheers,
Valentin
Valentin Schneider (3):
  workqueue: Hold wq_pool_mutex while affining tasks to
    wq_unbound_cpumask
  workqueue: Unbind workers before sending them to exit()
  DEBUG-DO-NOT-MERGE: workqueue: kworker spawner
 kernel/Makefile             |   2 +-
 kernel/workqueue.c          | 169 +++++++++++++++++++++++++++++-------
 kernel/workqueue_internal.h |   1 +
 kernel/wqstress.c           |  69 +++++++++++++++
 4 files changed, 207 insertions(+), 34 deletions(-)
 create mode 100644 kernel/wqstress.c
--
2.31.1
Powered by blists - more mailing lists
 
