lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20211207073543.61092-8-jiangshanlai@gmail.com>
Date:   Tue,  7 Dec 2021 15:35:43 +0800
From:   Lai Jiangshan <jiangshanlai@...il.com>
To:     linux-kernel@...r.kernel.org, Tejun Heo <tj@...nel.org>
Cc:     Lai Jiangshan <laijs@...ux.alibaba.com>,
        Lai Jiangshan <jiangshanlai@...il.com>
Subject: [RFC PATCH 7/7] workqueue: Replace pool lock with preemption disabling in wq_worker_sleeping()

From: Lai Jiangshan <laijs@...ux.alibaba.com>

Once upon a time,  wq_worker_sleeping() was called with rq lock held,
so wq_worker_sleeping() can not use pool lock.  Instead it used "X:"
protection: preemption disabled on local cpu and wq_worker_sleeping()
didn't depend on rq lock to work even with it held.

Now, wq_worker_sleeping() isn't called with rq lock held and is using
pool lock.  But the functionality of "X:" protection isn't removed and
wq_worker_running() is still using it.

So we can also use "X:" protection in wq_worker_sleeping() and avoid
locking the pool lock.

This patch also documents that only idle_list.next is under "X:"
protection, not the whole idle_list because destroy_worker() in idle
timer can remove non-first idle workers.  Idle timer can be possible
strayed in different CPU, or even in the same CPU, it can interrupt
preemption disabled context.

Signed-off-by: Lai Jiangshan <laijs@...ux.alibaba.com>
---
 kernel/workqueue.c | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 33f1106b4f99..6c30edbe2fc2 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -162,7 +162,8 @@ struct worker_pool {
 	int			nr_workers;	/* L: total number of workers */
 	int			nr_idle;	/* L: currently idle workers */
 
-	struct list_head	idle_list;	/* X: list of idle workers */
+	/* idle_list.next and first_idle_worker(): X: first idle worker */
+	struct list_head	idle_list;	/* L: list of idle workers */
 	struct timer_list	idle_timer;	/* L: worker idle timeout */
 	struct timer_list	mayday_timer;	/* L: SOS timer for workers */
 
@@ -819,6 +820,11 @@ static bool too_many_workers(struct worker_pool *pool)
 	int nr_idle = pool->nr_idle + managing; /* manager is considered idle */
 	int nr_busy = pool->nr_workers - nr_idle;
 
+	/*
+	 * nr_idle must be at least 2 to allow a manager and at least an idle
+	 * worker in idle_list so that idle_worker_timeout() doesn't touch
+	 * idle_list.next.
+	 */
 	return nr_idle > 2 && (nr_idle - 2) * MAX_IDLE_WORKERS_RATIO >= nr_busy;
 }
 
@@ -826,7 +832,7 @@ static bool too_many_workers(struct worker_pool *pool)
  * Wake up functions.
  */
 
-/* Return the first idle worker.  Safe with preemption disabled */
+/* Return the first idle worker.  Safe with "X:" protection */
 static struct worker *first_idle_worker(struct worker_pool *pool)
 {
 	if (unlikely(list_empty(&pool->idle_list)))
@@ -905,7 +911,7 @@ void wq_worker_sleeping(struct task_struct *task)
 		return;
 
 	worker->sleeping = 1;
-	raw_spin_lock_irq(&pool->lock);
+	preempt_disable();
 
 	/*
 	 * Recheck in case unbind_workers() preempted us. We don't
@@ -913,7 +919,7 @@ void wq_worker_sleeping(struct task_struct *task)
 	 * and nr_running has been reset.
 	 */
 	if (worker->flags & WORKER_NOT_RUNNING) {
-		raw_spin_unlock_irq(&pool->lock);
+		preempt_enable();
 		return;
 	}
 
@@ -923,10 +929,9 @@ void wq_worker_sleeping(struct task_struct *task)
 	 * Please read comment there.
 	 *
 	 * NOT_RUNNING is clear.  This means that we're bound to and
-	 * running on the local cpu w/ rq lock held and preemption
-	 * disabled, which in turn means that none else could be
-	 * manipulating idle_list, so dereferencing idle_list without pool
-	 * lock is safe.
+	 * running on the local cpu with preemption disabled, which in turn
+	 * means that no one else could be manipulating idle_list.next
+	 * so dereferencing idle_list.next without pool lock is safe.
 	 */
 	if (atomic_dec_and_test(&pool->nr_running) &&
 	    !list_empty(&pool->worklist)) {
@@ -934,7 +939,7 @@ void wq_worker_sleeping(struct task_struct *task)
 		if (next)
 			wake_up_process(next->task);
 	}
-	raw_spin_unlock_irq(&pool->lock);
+	preempt_enable();
 }
 
 /**
-- 
2.19.1.6.gb485710b

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ