linux-kernel - Re: [RFC PATCH] workqueue: handle CPU hotplug without updating worker pool's attrs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <CAJhGHyAbjbpzp_+eLEEVVgUUCh6aJjUYbEigVwTrM1_Gx1Jf1w@mail.gmail.com>
Date:   Fri, 11 Dec 2020 10:29:55 +0800
From:   Lai Jiangshan <jiangshanlai@...il.com>
To:     Hillf Danton <hdanton@...a.com>
Cc:     LKML <linux-kernel@...r.kernel.org>, NeilBrown <neilb@...e.de>,
        Daniel Jordan <daniel.m.jordan@...cle.com>,
        Tejun Heo <tj@...nel.org>
Subject: Re: [RFC PATCH] workqueue: handle CPU hotplug without updating worker
 pool's attrs

On Wed, Dec 9, 2020 at 4:34 PM Hillf Danton <hdanton@...a.com> wrote:
>
> This makes the cpumask intact for worker pools of unbound workqueues
> when CPUs go offline because we need to rebind workers to the original
> cpumask(of the original pool) when they come back, as represented by
> the cpu check in restore_unbound_workers_cpumask().
>
> Note this is now only for comments.
>
> Signed-off-by: Hillf Danton <hdanton@...a.com>
> ---
>
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -4117,6 +4117,32 @@ static void wq_update_unbound_numa(struc
>         copy_workqueue_attrs(target_attrs, wq->unbound_attrs);
>         pwq = unbound_pwq_by_node(wq, node);
>
> +       cpumask = pwq->pool->attrs->cpumask;
> +       if (!online || !cpumask_test_cpu(cpu, cpumask))
> +               return;
> +       do {
> +               struct worker *worker;
> +               int i;
> +
> +               for_each_cpu_and(i, cpu_online_mask, cpumask) {
> +                       if (i != cpu)
> +                               return;
> +               }
> +
> +               /*
> +                * rebind workers only when the first CPU in
> +                * pool's attrs cpumask comes back because scheduler
> +                * may have unbound them when the last CPU went offline
> +                */
> +               mutex_lock(&wq_pool_attach_mutex);
> +
> +               for_each_pool_worker(worker, pwq->pool)
> +                       set_cpus_allowed_ptr(worker->task, cpumask);
> +

There might be multiple pwqs that share the same pool, this line of
code might update the same pool multiple times.

Please keep the logic in restore_unbound_workers_cpumask().

> +               mutex_unlock(&wq_pool_attach_mutex);
> +               return;
> +       } while (0);
> +
>         /*
>          * Let's determine what needs to be done.  If the target cpumask is
>          * different from the default pwq's, we need to compare it to @pwq's
> @@ -5004,34 +5030,6 @@ static void rebind_workers(struct worker
>         raw_spin_unlock_irq(&pool->lock);
>  }
>
> -/**
> - * restore_unbound_workers_cpumask - restore cpumask of unbound workers
> - * @pool: unbound pool of interest
> - * @cpu: the CPU which is coming up
> - *
> - * An unbound pool may end up with a cpumask which doesn't have any online
> - * CPUs.  When a worker of such pool get scheduled, the scheduler resets
> - * its cpus_allowed.  If @cpu is in @pool's cpumask which didn't have any
> - * online CPU before, cpus_allowed of all its workers should be restored.
> - */
> -static void restore_unbound_workers_cpumask(struct worker_pool *pool, int cpu)
> -{
> -       static cpumask_t cpumask;
> -       struct worker *worker;
> -
> -       lockdep_assert_held(&wq_pool_attach_mutex);
> -
> -       /* is @cpu allowed for @pool? */
> -       if (!cpumask_test_cpu(cpu, pool->attrs->cpumask))
> -               return;
> -
> -       cpumask_and(&cpumask, pool->attrs->cpumask, cpu_online_mask);

Good catch of the problem.
But please fix it where the problem is found (here!)
Like this:

+        /* only restore the cpumask of the pool's workers when @cpu is
+           the first online cpu in @pool's cpumask */
+        if (cpumask_weight(cpumask) > 1)
+                return;

        /* as we're called from CPU_ONLINE, the following shouldn't fail */
        for_each_pool_worker(worker, pool)
-                WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, &cpumask) < 0);
+                WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task,
pool->attrs->cpumask) < 0);

> -
> -       /* as we're called from CPU_ONLINE, the following shouldn't fail */
> -       for_each_pool_worker(worker, pool)
> -               WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, &cpumask) < 0);
> -}
> -
>  int workqueue_prepare_cpu(unsigned int cpu)
>  {
>         struct worker_pool *pool;
> @@ -5058,8 +5056,6 @@ int workqueue_online_cpu(unsigned int cp
>
>                 if (pool->cpu == cpu)
>                         rebind_workers(pool);
> -               else if (pool->cpu < 0)
> -                       restore_unbound_workers_cpumask(pool, cpu);
>
>                 mutex_unlock(&wq_pool_attach_mutex);
>         }