lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 9 Oct 2021 11:21:58 +0800
From:   Boqun Feng <boqun.feng@...il.com>
To:     Lai Jiangshan <jiangshanlai@...il.com>
Cc:     LKML <linux-kernel@...r.kernel.org>, Tejun Heo <tj@...nel.org>,
        "Paul E . McKenney" <paulmck@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Peter Zijlstra <peterz@...radead.org>,
        Frederic Weisbecker <frederic@...nel.org>
Subject: Re: [RFC 2/2] workqueue: Fix work re-entrance when requeue to a
 different workqueue

On Sat, Oct 09, 2021 at 10:06:23AM +0800, Lai Jiangshan wrote:
> On Fri, Oct 8, 2021 at 6:06 PM Boqun Feng <boqun.feng@...il.com> wrote:
> >
> > When requeuing a work to a different workqueue while it's still getting
> > processed, re-entrace as the follow can happen:
> >
> >         { both WQ1 and WQ2 are bounded workqueue, and a work W has been
> >           queued on CPU0 for WQ1}
> >
> >         CPU 0                   CPU 1
> >         =====                   ====
> >         <In worker on CPU 0>
> >         process_one_work():
> >           ...
> >           // pick up W
> >           worker->current_work = W;
> >           worker->current_func = W->func;
> >           ...
> >           set_work_pool_and_clear_pending(...);
> >           // W can be requeued afterwards
> >                                 queue_work_on(1, WQ2, W):
> >                                   if (!test_and_set_bit(...)) {
> >                                     // this branch is taken, as CPU 0
> >                                     // just clears pending bit.
> >                                     __queue_work(...):
> >                                       pwq = <pool for CPU1 of WQ2>;
> >                                       last_pool = <pool for CPU 0 of WQ1>;
> >                                       if (last_pool != pwq->pool) { // true
> >                                         if (.. && worker->current_pwq->wq == wq) {
> >                                           // false, since @worker is a
> >                                           // a worker of @last_pool (for
> >                                           // WQ1), and @wq is WQ2.
> >                                         }
> >                                         ...
> >                                         insert_work(pwq, W, ...);
> >                                       }
> >                                 // W queued.
> >                                 <schedule to worker on CPU 1>
> >                                 process_one_work():
> >                                   collision = find_worker_executing_work(..);
> >                                   // NULL, because we're searching the
> >                                   // worker pool of CPU 1, while W is
> >                                   // the current work on worker pool of
> >                                   // CPU 0.
> >                                   worker->current_work = W;
> >                                   worker->current_func = W->func;
> >           worker->current_func(...);
> >                                   ...
> >                                   worker->current_func(...); // Re-entrance
> 
> Concurrent or parallel executions on the same work item aren't
> considered as "Re-entrance" if the workqueue is changed.
> 

Well, then Documentation/core-api/workqueue.rst can use some help:

"Note that the flag ``WQ_NON_REENTRANT`` no longer exists as all
workqueues are now non-reentrant - any work item is guaranteed to be
executed by at most one worker system-wide at any given time."

Clearly in the above case that a work item is executed by two worker at
the same time.

> It allows the work function to free itself(the item) and another
> subsystem allocates the same item and reuses it.
> 

So you're saying in process_one_work(), ->current_work can point to a
work which gets freed and reallocated before the worker actually
execute it? And users should guarantee it's safe to do so? I mean this
is something that workqueue subsystem allows/expects users to do?

> "Re-entrance" is defined as:
>   work function has not been changed
>   wq has not been changed
>   the item has not been reinitiated.
>   (To reduce the check complication, the workqueue subsystem often
> considers it "Re-entrance" if the condition is changed and has changed
> back. But the wq users should not depend on this behavior and should avoid
> it)
> 

Thanks for clarifiction, could you also update the documentation to
avoid future confusion? Thanks!

Regards,
Boqun

> 
> >
> > This issue is already partially fixed because in queue_work_on(),
> > last_pool can be used to queue the work, as a result the requeued work
> > processing will find the collision and wait for the existing one to
> > finish. However, currently the last_pool is only used when two
> > workqueues are the same one, which causes the issue. Therefore extend
> > the behavior to allow last_pool to requeue the work W even if the
> > workqueues are different. It's safe to do this since the work W has been
> > proved safe to queue and run on the last_pool.
> >
> > Signed-off-by: Boqun Feng <boqun.feng@...il.com>
> > ---
> >  kernel/workqueue.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> > index 1418710bffcd..410141cc5f88 100644
> > --- a/kernel/workqueue.c
> > +++ b/kernel/workqueue.c
> > @@ -1465,7 +1465,7 @@ static void __queue_work(int cpu, struct workqueue_struct *wq,
> >
> >                 worker = find_worker_executing_work(last_pool, work);
> >
> > -               if (worker && worker->current_pwq->wq == wq) {
> > +               if (worker) {
> >                         pwq = worker->current_pwq;
> >                 } else {
> >                         /* meh... not running there, queue here */
> > --
> > 2.32.0
> >

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ