linux-kernel - Re: [RFC PATCH v2 1/2] workqueue: Unbind workers before sending them to exit()

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAJhGHyCeraX1jcea9kt_FBC561zBgECuw5qx8TAdCG0EHnT6kA@mail.gmail.com>
Date:   Thu, 28 Jul 2022 01:13:37 +0800
From:   Lai Jiangshan <jiangshanlai@...il.com>
To:     Valentin Schneider <vschneid@...hat.com>
Cc:     LKML <linux-kernel@...r.kernel.org>, Tejun Heo <tj@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Frederic Weisbecker <frederic@...nel.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Phil Auld <pauld@...hat.com>,
        Marcelo Tosatti <mtosatti@...hat.com>
Subject: Re: [RFC PATCH v2 1/2] workqueue: Unbind workers before sending them
 to exit()

Quick review before going to sleep.

On Wed, Jul 27, 2022 at 7:54 PM Valentin Schneider <vschneid@...hat.com> wrote:
>
> It has been reported that isolated CPUs can suffer from interference due to
> per-CPU kworkers waking up just to die.
>
> A surge of workqueue activity during initial setup of a latency-sensitive
> application (refresh_vm_stats() being one of the culprits) can cause extra
> per-CPU kworkers to be spawned. Then, said latency-sensitive task can be
> running merrily on an isolated CPU only to be interrupted sometime later by
> a kworker marked for death (cf. IDLE_WORKER_TIMEOUT, 5 minutes after last
> kworker activity).
>
> Prevent this by affining kworkers to the wq_unbound_cpumask (which doesn't
> contain isolated CPUs, cf. HK_TYPE_WQ) before waking them up after marking
> them with WORKER_DIE.
>
> Changing the affinity does require a sleepable context, so get rid of the
> pool->idle_timer and use a delayed_work instead.
>
> Signed-off-by: Valentin Schneider <vschneid@...hat.com>
> ---
>  kernel/workqueue.c | 109 +++++++++++++++++++++++++++++++++------------
>  1 file changed, 81 insertions(+), 28 deletions(-)
>
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 1ea50f6be843..27642166dcc5 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -167,9 +167,9 @@ struct worker_pool {
>         int                     nr_workers;     /* L: total number of workers */
>         int                     nr_idle;        /* L: currently idle workers */
>
> -       struct list_head        idle_list;      /* L: list of idle workers */
> -       struct timer_list       idle_timer;     /* L: worker idle timeout */
> -       struct timer_list       mayday_timer;   /* L: SOS timer for workers */
> +       struct list_head        idle_list;        /* L: list of idle workers */
> +       struct delayed_work     idle_reaper_work; /* L: worker idle timeout */
> +       struct timer_list       mayday_timer;     /* L: SOS timer for workers */
>
>         /* a workers is either on busy_hash or idle_list, or the manager */
>         DECLARE_HASHTABLE(busy_hash, BUSY_WORKER_HASH_ORDER);
> @@ -1806,8 +1806,10 @@ static void worker_enter_idle(struct worker *worker)
>         /* idle_list is LIFO */
>         list_add(&worker->entry, &pool->idle_list);
>
> -       if (too_many_workers(pool) && !timer_pending(&pool->idle_timer))
> -               mod_timer(&pool->idle_timer, jiffies + IDLE_WORKER_TIMEOUT);
> +       if (too_many_workers(pool) && !delayed_work_pending(&pool->idle_reaper_work))
> +               mod_delayed_work(system_unbound_wq,
> +                                &pool->idle_reaper_work,
> +                                IDLE_WORKER_TIMEOUT);

system_unbound_wq doesn't have a rescuer.

A new workqueue with a rescuer needs to be created and used for
this purpose.

>
>         /* Sanity check nr_running. */
>         WARN_ON_ONCE(pool->nr_workers == pool->nr_idle && pool->nr_running);
> @@ -1972,9 +1974,29 @@ static struct worker *create_worker(struct worker_pool *pool)
>         return NULL;
>  }
>
> +static void unbind_worker(struct worker *worker)
> +{
> +       kthread_set_per_cpu(worker->task, -1);
> +       WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, wq_unbound_cpumask) < 0);
> +}
> +
> +static void rebind_worker(struct worker *worker, struct worker_pool *pool)
> +{
> +       kthread_set_per_cpu(worker->task, pool->cpu);
> +       WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask) < 0);
> +}
> +
> +static void reap_worker(struct worker *worker)
> +{
> +       list_del_init(&worker->entry);
> +       unbind_worker(worker);
> +       wake_up_process(worker->task);


Since WORKER_DIE is set, the worker can be possible freed now
if there is another source to wake it up.

I think reverting a part of the commit 60f5a4bcf852("workqueue:
async worker destruction") to make use of kthread_stop()
in destroy_worker() should be a good idea.