linux-kernel - Re: [RFC 2/2] workqueue: Fix work re-entrance when requeue to a different workqueue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAJhGHyDudet_xyNk=8xnuO2==o-u06s0E0GZVP4Q67nmQ84Ceg@mail.gmail.com>
Date:   Sat, 9 Oct 2021 10:06:23 +0800
From:   Lai Jiangshan <jiangshanlai@...il.com>
To:     Boqun Feng <boqun.feng@...il.com>
Cc:     LKML <linux-kernel@...r.kernel.org>, Tejun Heo <tj@...nel.org>,
        "Paul E . McKenney" <paulmck@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Peter Zijlstra <peterz@...radead.org>,
        Frederic Weisbecker <frederic@...nel.org>
Subject: Re: [RFC 2/2] workqueue: Fix work re-entrance when requeue to a
 different workqueue

On Fri, Oct 8, 2021 at 6:06 PM Boqun Feng <boqun.feng@...il.com> wrote:
>
> When requeuing a work to a different workqueue while it's still getting
> processed, re-entrace as the follow can happen:
>
>         { both WQ1 and WQ2 are bounded workqueue, and a work W has been
>           queued on CPU0 for WQ1}
>
>         CPU 0                   CPU 1
>         =====                   ====
>         <In worker on CPU 0>
>         process_one_work():
>           ...
>           // pick up W
>           worker->current_work = W;
>           worker->current_func = W->func;
>           ...
>           set_work_pool_and_clear_pending(...);
>           // W can be requeued afterwards
>                                 queue_work_on(1, WQ2, W):
>                                   if (!test_and_set_bit(...)) {
>                                     // this branch is taken, as CPU 0
>                                     // just clears pending bit.
>                                     __queue_work(...):
>                                       pwq = <pool for CPU1 of WQ2>;
>                                       last_pool = <pool for CPU 0 of WQ1>;
>                                       if (last_pool != pwq->pool) { // true
>                                         if (.. && worker->current_pwq->wq == wq) {
>                                           // false, since @worker is a
>                                           // a worker of @last_pool (for
>                                           // WQ1), and @wq is WQ2.
>                                         }
>                                         ...
>                                         insert_work(pwq, W, ...);
>                                       }
>                                 // W queued.
>                                 <schedule to worker on CPU 1>
>                                 process_one_work():
>                                   collision = find_worker_executing_work(..);
>                                   // NULL, because we're searching the
>                                   // worker pool of CPU 1, while W is
>                                   // the current work on worker pool of
>                                   // CPU 0.
>                                   worker->current_work = W;
>                                   worker->current_func = W->func;
>           worker->current_func(...);
>                                   ...
>                                   worker->current_func(...); // Re-entrance

Concurrent or parallel executions on the same work item aren't
considered as "Re-entrance" if the workqueue is changed.

It allows the work function to free itself(the item) and another
subsystem allocates the same item and reuses it.

"Re-entrance" is defined as:
  work function has not been changed
  wq has not been changed
  the item has not been reinitiated.
  (To reduce the check complication, the workqueue subsystem often
considers it "Re-entrance" if the condition is changed and has changed
back. But the wq users should not depend on this behavior and should avoid
it)


>
> This issue is already partially fixed because in queue_work_on(),
> last_pool can be used to queue the work, as a result the requeued work
> processing will find the collision and wait for the existing one to
> finish. However, currently the last_pool is only used when two
> workqueues are the same one, which causes the issue. Therefore extend
> the behavior to allow last_pool to requeue the work W even if the
> workqueues are different. It's safe to do this since the work W has been
> proved safe to queue and run on the last_pool.
>
> Signed-off-by: Boqun Feng <boqun.feng@...il.com>
> ---
>  kernel/workqueue.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 1418710bffcd..410141cc5f88 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -1465,7 +1465,7 @@ static void __queue_work(int cpu, struct workqueue_struct *wq,
>
>                 worker = find_worker_executing_work(last_pool, work);
>
> -               if (worker && worker->current_pwq->wq == wq) {
> +               if (worker) {
>                         pwq = worker->current_pwq;
>                 } else {
>                         /* meh... not running there, queue here */
> --
> 2.32.0
>