linux-kernel - Re: [PATCH -tip V3 0/8] workqueue: break affinity initiatively

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <X/7VQ8pF5h/K+Cj1@hirez.programming.kicks-ass.net>
Date:   Wed, 13 Jan 2021 12:10:59 +0100
From:   Peter Zijlstra <peterz@...radead.org>
To:     Lai Jiangshan <jiangshanlai@...il.com>
Cc:     Valentin Schneider <valentin.schneider@....com>,
        Thomas Gleixner <tglx@...utronix.de>,
        LKML <linux-kernel@...r.kernel.org>, Qian Cai <cai@...hat.com>,
        Vincent Donnefort <vincent.donnefort@....com>,
        Dexuan Cui <decui@...rosoft.com>,
        Lai Jiangshan <laijs@...ux.alibaba.com>,
        Paul McKenney <paulmck@...nel.org>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Jens Axboe <axboe@...nel.dk>
Subject: Re: [PATCH -tip V3 0/8] workqueue: break affinity initiatively

On Tue, Jan 12, 2021 at 11:38:12PM +0800, Lai Jiangshan wrote:

> But the hard problem is "how to suppress the warning of
> online&!active in __set_cpus_allowed_ptr()" for late spawned
> unbound workers during hotplug.

I cannot see create_worker() go bad like that.

The thing is, it uses:

  kthread_bind_mask(, pool->attr->cpumask)
  worker_attach_to_pool()
    set_cpus_allowed_ptr(, pool->attr->cpumask)

which means set_cpus_allowed_ptr() must be a NOP, because the affinity
is already set by kthread_bind_mask(). Further, the first wakeup of that
worker will then hit:

  select_task_rq()
    is_cpu_allowed()
      is_per_cpu_kthread() -- false
    select_fallback_rq()


So normally that really isn't a problem. I can only see a tiny hole
there, where someone changes the cpumask between kthread_bind_mask() and
set_cpus_allowed_ptr(). AFAICT that can be fixed in two ways:

 - add wq_pool_mutex around things in create_worker(), or
 - move the set_cpus_allowed_ptr() out of worker_attach_to_pool() and
   into rescuer_thread().

Which then brings us to rescuer_thread...  If we manage to trigger the
rescuer during hotplug, then yes, I think that can go wobbly.

Let me consider that a bit more while I try and make sense of that splat
Paul reported.

---

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index ec0771e4a3fb..fe05308dc472 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1844,15 +1844,19 @@ static struct worker *alloc_worker(int node)
  * cpu-[un]hotplugs.
  */
 static void worker_attach_to_pool(struct worker *worker,
-				   struct worker_pool *pool)
+				  struct worker_pool *pool,
+				  bool set_affinity)
 {
 	mutex_lock(&wq_pool_attach_mutex);
 
-	/*
-	 * set_cpus_allowed_ptr() will fail if the cpumask doesn't have any
-	 * online CPUs.  It'll be re-applied when any of the CPUs come up.
-	 */
-	set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
+	if (set_affinity) {
+		/*
+		 * set_cpus_allowed_ptr() will fail if the cpumask doesn't have
+		 * any online CPUs.  It'll be re-applied when any of the CPUs
+		 * come up.
+		 */
+		set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask);
+	}
 
 	/*
 	 * The wq_pool_attach_mutex ensures %POOL_DISASSOCIATED remains
@@ -1944,7 +1948,7 @@ static struct worker *create_worker(struct worker_pool *pool)
 	kthread_bind_mask(worker->task, pool->attrs->cpumask);
 
 	/* successful, attach the worker to the pool */
-	worker_attach_to_pool(worker, pool);
+	worker_attach_to_pool(worker, pool, false);
 
 	/* start the newly created worker */
 	raw_spin_lock_irq(&pool->lock);
@@ -2509,7 +2513,11 @@ static int rescuer_thread(void *__rescuer)
 
 		raw_spin_unlock_irq(&wq_mayday_lock);
 
-		worker_attach_to_pool(rescuer, pool);
+		/*
+		 * XXX can go splat when running during hot-un-plug and
+		 * the pool affinity is wobbly.
+		 */
+		worker_attach_to_pool(rescuer, pool, true);
 
 		raw_spin_lock_irq(&pool->lock);