linux-kernel - [PATCH -tip V4 7/8] workqueue: Manually break affinity on hotplug for unbound pool

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20210111152638.2417-8-jiangshanlai@gmail.com>
Date:   Mon, 11 Jan 2021 23:26:37 +0800
From:   Lai Jiangshan <jiangshanlai@...il.com>
To:     linux-kernel@...r.kernel.org
Cc:     Valentin Schneider <valentin.schneider@....com>,
        Peter Zijlstra <peterz@...radead.org>,
        Qian Cai <cai@...hat.com>,
        Vincent Donnefort <vincent.donnefort@....com>,
        Tejun Heo <tj@...nel.org>,
        "Paul E . McKenney" <paulmck@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Hillf Danton <hdanton@...a.com>,
        Lai Jiangshan <laijs@...ux.alibaba.com>,
        Lai Jiangshan <jiangshanlai@...il.com>,
        Daniel Bristot de Oliveira <bristot@...hat.com>
Subject: [PATCH -tip V4 7/8] workqueue: Manually break affinity on hotplug for unbound pool

From: Lai Jiangshan <laijs@...ux.alibaba.com>

There is possible that a per-node pool/woker's affinity is a single
CPU.  It can happen when the workqueue user changes the cpumask of the
workqueue or when wq_unbound_cpumask is changed by system adim via
/sys/devices/virtual/workqueue/cpumask.  And pool->attrs->cpumask
is workqueue's cpumask & wq_unbound_cpumask & possible_cpumask_of_the_node,
which can be a single CPU and makes the pool's workers to be "per cpu
kthread".

And the scheduler won't break affinity on the "per cpu kthread" workers
when the CPU is going down, so we have to do it by ourselves.

We do it by introducing new break_unbound_workers_cpumask() which is a
symmetric version of restore_unbound_workers_cpumask().   When the last
online CPU of the pool goes down, it is time to break the affinity.

The way to break affinity is to set the workers' affinity to
cpu_possible_mask, so that we preserve the same behavisor when
the scheduler breaks affinity for us.

Fixes: 06249738a41a ("workqueue: Manually break affinity on hotplug")
Acked-by: Tejun Heo <tj@...nel.org>
Tested-by: Paul E. McKenney <paulmck@...nel.org>
Signed-off-by: Lai Jiangshan <laijs@...ux.alibaba.com>
---
 kernel/workqueue.c | 65 +++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 62 insertions(+), 3 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index f2793749bd97..b012adbeff9f 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -5035,8 +5035,9 @@ static void rebind_workers(struct worker_pool *pool)
  *
  * An unbound pool may end up with a cpumask which doesn't have any online
  * CPUs.  When a worker of such pool get scheduled, the scheduler resets
- * its cpus_allowed.  If @cpu is in @pool's cpumask which didn't have any
- * online CPU before, cpus_allowed of all its workers should be restored.
+ * its cpus_allowed or we had reset it earlier in break_unbound_workers_cpumask().
+ * If @cpu is in @pool's cpumask which didn't have any online CPU before,
+ * cpus_allowed of all its workers should be restored.
  */
 static void restore_unbound_workers_cpumask(struct worker_pool *pool, int cpu)
 {
@@ -5061,6 +5062,50 @@ static void restore_unbound_workers_cpumask(struct worker_pool *pool, int cpu)
 						  pool->attrs->cpumask) < 0);
 }
 
+/**
+ * break_unbound_workers_cpumask - break cpumask of unbound workers
+ * @pool: unbound pool of interest
+ * @cpu: the CPU which is going down
+ *
+ * An unbound pool may end up with a cpumask which doesn't have any online
+ * CPUs.  When a worker of such pool get scheduled, the scheduler resets
+ * its cpus_allowed unless there is only one CPU in the cpus_allowed which
+ * is the special case we need to handle it on our own and avoid blocking
+ * the hotplug process or causing further harms.
+ */
+static void break_unbound_workers_cpumask(struct worker_pool *pool, int cpu)
+{
+	struct worker *worker;
+
+	lockdep_assert_held(&wq_pool_mutex);
+	lockdep_assert_held(&wq_pool_attach_mutex);
+
+	/* is @cpu allowed for @pool? */
+	if (!cpumask_test_cpu(cpu, pool->attrs->cpumask))
+		return;
+
+	/*
+	 * is @cpu the last online for @pool?  If so, the scheduler or we
+	 * need to break affinity for the workers.
+	 */
+	if (cpumask_intersects(pool->attrs->cpumask, wq_unbound_online_cpumask))
+		return;
+
+	/*
+	 * is @cpu the only possible CPU for @pool?  If not, scheduler
+	 * will take care of breaking affinity for the workers since the
+	 * workers are all non-per-cpu-kthread.  It is the usual case
+	 * for unbound pools/workers and we don't need to bother to do it.
+	 */
+	if (cpumask_weight(pool->attrs->cpumask) > 1)
+		return;
+
+	/* as we're setting it to cpu_possible_mask, the following shouldn't fail */
+	for_each_pool_worker(worker, pool)
+		WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task,
+						  cpu_possible_mask) < 0);
+}
+
 int workqueue_prepare_cpu(unsigned int cpu)
 {
 	struct worker_pool *pool;
@@ -5126,13 +5171,27 @@ int workqueue_unbound_online_cpu(unsigned int cpu)
 
 int workqueue_unbound_offline_cpu(unsigned int cpu)
 {
+	struct worker_pool *pool;
 	struct workqueue_struct *wq;
+	int pi;
 
-	/* update NUMA affinity of unbound workqueues */
 	mutex_lock(&wq_pool_mutex);
 	cpumask_clear_cpu(cpu, wq_unbound_online_cpumask);
+
+	/* update CPU affinity of workers of unbound pools */
+	for_each_pool(pool, pi) {
+		mutex_lock(&wq_pool_attach_mutex);
+
+		if (pool->cpu < 0)
+			break_unbound_workers_cpumask(pool, cpu);
+
+		mutex_unlock(&wq_pool_attach_mutex);
+	}
+
+	/* update NUMA affinity of unbound workqueues */
 	list_for_each_entry(wq, &workqueues, list)
 		wq_update_unbound_numa(wq, cpu);
+
 	mutex_unlock(&wq_pool_mutex);
 
 	return 0;
-- 
2.19.1.6.gb485710b