lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <1516005492-4994-1-git-send-email-neeraju@codeaurora.org>
Date:   Mon, 15 Jan 2018 14:08:12 +0530
From:   Neeraj Upadhyay <neeraju@...eaurora.org>
To:     tj@...nel.org, jiangshanlai@...il.com
Cc:     linux-kernel@...r.kernel.org, linux-arm-msm@...r.kernel.org,
        prsood@...eaurora.org, sramana@...eaurora.org,
        Neeraj Upadhyay <neeraju@...eaurora.org>
Subject: [PATCH] workqueue: Handle race between wake up and rebind

There is a potential race b/w rebind_workers() and
wakeup of a worker thread, which can result in
workqueue lockup for a bounder worker pool.

Below is the potential race:

- cpu0 is a bounded worker pool, which is unbound
  from its cpu. A new work is queued on this pool,
  which causes its worker (kworker/0:0) to be woken
  up on a cpu different from cpu0, lets say cpu1.

  workqueue_queue_work
    workqueue_activate_work
      <worker 0 is woken up on cpu1>

- cpu0 rebind happens
  rebind_workers()
    Clears POOL_DISASSOCIATED and binds cpumask of all
    workers.

- kworker/0:0 gets chance to run on cpu1; while processing
  a work, it goes to sleep. However, it does not decrement
  pool->nr_running. This is because WORKER_REBOUND (NOT_
  RUNNING) flag was cleared, when worker entered worker_
  thread().

  Worker 0 runs on cpu1
    worker_thread()
      process_one_work()
        wq_worker_sleeping()
          if (worker->flags & WORKER_NOT_RUNNING)
            return NULL;
          if (WARN_ON_ONCE(pool->cpu != raw_smp_processor_id()))
            <Does not decrement nr_running>

- After this, when kworker/0:0 wakes up, this time on its
  bounded cpu cpu0, it increments pool->nr_running again.
  So, pool->nr_running becomes 2.

- When kworker/0:0 enters idle, it decrements pool->nr_running
  by 1. This leaves pool->nr_running =1 , with no workers in
  runnable state.

- Now, no new workers will be woken up, as pool->nr_running is
  non-zero. This results in indefinite lockup for this pool.

Fix this by deferring the work to some other idle worker,
if the current worker is not bound to its pool's CPU.

Signed-off-by: Neeraj Upadhyay <neeraju@...eaurora.org>
---
 kernel/workqueue.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 43d18cb..71c0023 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2218,6 +2218,17 @@ static int worker_thread(void *__worker)
 	if (unlikely(!may_start_working(pool)) && manage_workers(worker))
 		goto recheck;
 
+	/* handle the case where, while a bounded pool is unbound,
+	 * its worker is woken up on a target CPU, which is different
+	 * from pool->cpu, but pool is rebound before this worker gets
+	 * chance to run on the target CPU.
+	 */
+	if (WARN_ON_ONCE(!(pool->flags & POOL_DISASSOCIATED) &&
+		raw_smp_processor_id() != pool->cpu)) {
+		wake_up_worker(pool);
+		goto sleep;
+	}
+
 	/*
 	 * ->scheduled list can only be filled while a worker is
 	 * preparing to process a work or actually processing it.
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ