linux-kernel - [RFC 2/2] workqueue: Fix work re-entrance when requeue to a different workqueue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20211008100454.2802393-3-boqun.feng@gmail.com>
Date:   Fri,  8 Oct 2021 18:04:54 +0800
From:   Boqun Feng <boqun.feng@...il.com>
To:     linux-kernel@...r.kernel.org
Cc:     Tejun Heo <tj@...nel.org>, Lai Jiangshan <jiangshanlai@...il.com>,
        "Paul E . McKenney" <paulmck@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Peter Zijlstra <peterz@...radead.org>,
        Frederic Weisbecker <frederic@...nel.org>,
        Boqun Feng <boqun.feng@...il.com>
Subject: [RFC 2/2] workqueue: Fix work re-entrance when requeue to a different workqueue

When requeuing a work to a different workqueue while it's still getting
processed, re-entrace as the follow can happen:

	{ both WQ1 and WQ2 are bounded workqueue, and a work W has been
	  queued on CPU0 for WQ1}

	CPU 0			CPU 1
	=====			====
	<In worker on CPU 0>
	process_one_work():
	  ...
	  // pick up W
	  worker->current_work = W;
	  worker->current_func = W->func;
	  ...
	  set_work_pool_and_clear_pending(...);
	  // W can be requeued afterwards
				queue_work_on(1, WQ2, W):
				  if (!test_and_set_bit(...)) {
				    // this branch is taken, as CPU 0
				    // just clears pending bit.
				    __queue_work(...):
				      pwq = <pool for CPU1 of WQ2>;
				      last_pool = <pool for CPU 0 of WQ1>;
				      if (last_pool != pwq->pool) { // true
				        if (.. && worker->current_pwq->wq == wq) {
					  // false, since @worker is a
					  // a worker of @last_pool (for
					  // WQ1), and @wq is WQ2.
					}
					...
					insert_work(pwq, W, ...);
				      }
				// W queued.
				<schedule to worker on CPU 1>
				process_one_work():
				  collision = find_worker_executing_work(..);
				  // NULL, because we're searching the
				  // worker pool of CPU 1, while W is
				  // the current work on worker pool of
				  // CPU 0.
				  worker->current_work = W;
				  worker->current_func = W->func;
	  worker->current_func(...);
	  			  ...
	  			  worker->current_func(...); // Re-entrance

This issue is already partially fixed because in queue_work_on(),
last_pool can be used to queue the work, as a result the requeued work
processing will find the collision and wait for the existing one to
finish. However, currently the last_pool is only used when two
workqueues are the same one, which causes the issue. Therefore extend
the behavior to allow last_pool to requeue the work W even if the
workqueues are different. It's safe to do this since the work W has been
proved safe to queue and run on the last_pool.

Signed-off-by: Boqun Feng <boqun.feng@...il.com>
---
 kernel/workqueue.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 1418710bffcd..410141cc5f88 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1465,7 +1465,7 @@ static void __queue_work(int cpu, struct workqueue_struct *wq,
 
 		worker = find_worker_executing_work(last_pool, work);
 
-		if (worker && worker->current_pwq->wq == wq) {
+		if (worker) {
 			pwq = worker->current_pwq;
 		} else {
 			/* meh... not running there, queue here */
-- 
2.32.0