linux-kernel - Re: [PATCH] workqueue: Don't double assign worker->sleeping

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAJhGHyBS9Z=x-X2Bxzbic2sfqj=STqr+K8Tgu1UfYMQDm6MtBg@mail.gmail.com>
Date:   Wed, 1 Apr 2020 11:44:06 +0800
From:   Lai Jiangshan <jiangshanlai@...il.com>
To:     Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc:     kernel test robot <lkp@...el.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...nel.org>,
        "Peter Zijlstra (Intel)" <peterz@...radead.org>,
        LKML <linux-kernel@...r.kernel.org>, LKP <lkp@...ts.01.org>,
        Tejun Heo <tj@...nel.org>
Subject: Re: [PATCH] workqueue: Don't double assign worker->sleeping

On Wed, Apr 1, 2020 at 11:22 AM Lai Jiangshan <jiangshanlai@...il.com> wrote:
>
> Hello
>
> If I don't miss all the issues you have listed, it is a good and straightforward
> fix, but I have concern that cmpxchg_local() might have performance impact
> on some non-x86 arch.
>
> The two issues as you have listed:
> 1) WARN_ON_ONCE() on valid condition when interrupted(async-page-faulted)
> 2) wq_worker_running() can be interrupted(async-page-faulted in virtual machine)
> and nr_running would be decreased twice.

would be *increased* twice

I just saw the V2 patch, this issue is not listed, but need to be fixed too.

>
> For fixing issue one, we can just remove WARN_ON_ONCE() as this patch.
> For fixing issue two, ->sleeping in wq_worker_running() can be checked&modified
> under irq-disabled.  (we can't use preempt-disabled context here)
>
> thanks,
> Lai
>
> On Sat, Mar 28, 2020 at 1:53 AM Sebastian Andrzej Siewior
> <bigeasy@...utronix.de> wrote:
> >
> > The kernel test robot triggered a warning with the following race:
> >    task-ctx                              interrupt-ctx
> >  worker
> >   -> process_one_work()
> >     -> work_item()
> >       -> schedule();
> >          -> sched_submit_work()
> >            -> wq_worker_sleeping()
> >              -> ->sleeping = 1
> >                atomic_dec_and_test(nr_running)
> >          __schedule();                *interrupt*
> >                                        async_page_fault()
> >                                        -> local_irq_enable();
> >                                        -> schedule();
> >                                           -> sched_submit_work()
> >                                             -> wq_worker_sleeping()
> >                                                -> if (WARN_ON(->sleeping)) return
> >                                           -> __schedule()
> >                                             ->  sched_update_worker()
> >                                               -> wq_worker_running()
> >                                                  -> atomic_inc(nr_running);
> >                                                  -> ->sleeping = 0;
> >
> >       ->  sched_update_worker()
> >         -> wq_worker_running()
> >           if (!->sleeping) return
> >
> > In this context the warning is pointless everything is fine.
> >
> > However, if the interrupt occurs in wq_worker_sleeping() between reading and
> > setting `sleeping' i.e.
> >
> > |        if (WARN_ON_ONCE(worker->sleeping))
> > |                return;
> >  *interrupt*
> > |        worker->sleeping = 1;
> >
> > then pool->nr_running will be decremented twice in wq_worker_sleeping()
> > but it will be incremented only once in wq_worker_running().
> >
> > Replace the assignment of `sleeping' with a cmpxchg_local() to ensure
> > that there is no double assignment of the variable. The variable is only
> > accessed from the local CPU. Remove the WARN statement because this
> > condition can be valid.
> >
> > An alternative would be to move `->sleeping' to `->flags' as a new bit
> > but this would require to acquire the pool->lock in wq_worker_running().
> >
> > Fixes: 6d25be5782e48 ("sched/core, workqueues: Distangle worker accounting from rq lock")
> > Link: https://lkml.kernel.org/r/20200327074308.GY11705@shao2-debian
> > Reported-by: kernel test robot <lkp@...el.com>
> > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
> > ---
> >  kernel/workqueue.c | 6 ++----
> >  1 file changed, 2 insertions(+), 4 deletions(-)
> >
> > diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> > index 4e01c448b4b48..dc477a2a3ce30 100644
> > --- a/kernel/workqueue.c
> > +++ b/kernel/workqueue.c
> > @@ -846,11 +846,10 @@ void wq_worker_running(struct task_struct *task)
> >  {
> >         struct worker *worker = kthread_data(task);
> >
> > -       if (!worker->sleeping)
> > +       if (cmpxchg_local(&worker->sleeping, 1, 0) == 0)
> >                 return;
> >         if (!(worker->flags & WORKER_NOT_RUNNING))
> >                 atomic_inc(&worker->pool->nr_running);
> > -       worker->sleeping = 0;
> >  }
> >
> >  /**
> > @@ -875,10 +874,9 @@ void wq_worker_sleeping(struct task_struct *task)
> >
> >         pool = worker->pool;
> >
> > -       if (WARN_ON_ONCE(worker->sleeping))
> > +       if (cmpxchg_local(&worker->sleeping, 0, 1) == 1)
> >                 return;
> >
> > -       worker->sleeping = 1;
> >         spin_lock_irq(&pool->lock);
> >
> >         /*
> > --
> > 2.26.0
> >