lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 3 Mar 2015 14:57:18 +0100
From:	Tomeu Vizoso <tomeu.vizoso@...il.com>
To:	Tejun Heo <tj@...nel.org>
Cc:	Jesper Nilsson <jesper.nilsson@...s.com>,
	Rabin Vincent <rabinv@...s.com>,
	Jesper Nilsson <jespern@...s.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH wq/for-4.0-fixes v2] workqueue: fix hang involving racing
 cancel[_delayed]_work_sync()'s for PREEMPT_NONE

On 3 March 2015 at 14:45, Tejun Heo <tj@...nel.org> wrote:
> Hello,
>
> Found the culprit.  Plain wake_up() shouldn't be used on
> bit_waitqueues.  The following patch should fix the issue.  I replaced
> the patch in the wq branches.

Yup, this looks good from here.

Thanks,

Tomeu

> Thanks a lot.
>
> ----- 8< -----
> From bba5a377507efef69214108c0d3790cadfab21d4 Mon Sep 17 00:00:00 2001
> From: Tejun Heo <tj@...nel.org>
> Date: Tue, 3 Mar 2015 08:43:09 -0500
>
> cancel[_delayed]_work_sync() are implemented using
> __cancel_work_timer() which grabs the PENDING bit using
> try_to_grab_pending() and then flushes the work item with PENDING set
> to prevent the on-going execution of the work item from requeueing
> itself.
>
> try_to_grab_pending() can always grab PENDING bit without blocking
> except when someone else is doing the above flushing during
> cancelation.  In that case, try_to_grab_pending() returns -ENOENT.  In
> this case, __cancel_work_timer() currently invokes flush_work().  The
> assumption is that the completion of the work item is what the other
> canceling task would be waiting for too and thus waiting for the same
> condition and retrying should allow forward progress without excessive
> busy looping
>
> Unfortunately, this doesn't work if preemption is disabled or the
> latter task has real time priority.  Let's say task A just got woken
> up from flush_work() by the completion of the target work item.  If,
> before task A starts executing, task B gets scheduled and invokes
> __cancel_work_timer() on the same work item, its try_to_grab_pending()
> will return -ENOENT as the work item is still being canceled by task A
> and flush_work() will also immediately return false as the work item
> is no longer executing.  This puts task B in a busy loop possibly
> preventing task A from executing and clearing the canceling state on
> the work item leading to a hang.
>
> task A                  task B                  worker
>
>                                                 executing work
> __cancel_work_timer()
>   try_to_grab_pending()
>   set work CANCELING
>   flush_work()
>     block for work completion
>                                                 completion, wakes up A
>                         __cancel_work_timer()
>                         while (forever) {
>                           try_to_grab_pending()
>                             -ENOENT as work is being canceled
>                           flush_work()
>                             false as work is no longer executing
>                         }
>
> This patch removes the possible hang by updating __cancel_work_timer()
> to explicitly wait for clearing of CANCELING rather than invoking
> flush_work() after try_to_grab_pending() fails with -ENOENT.  The
> explicit wait uses the matching bit waitqueue for the CANCELING bit.
>
> Link: http://lkml.kernel.org/g/20150206171156.GA8942@axis.com
>
> v2: v1 used wake_up() on bit_waitqueue() which leads to NULL deref if
>     the target bit waitqueue has wait_bit_queue's on it.  Use
>     DEFINE_WAIT_BIT() and __wake_up_bit() instead.  Reported by Tomeu
>     Vizoso.
>
> Signed-off-by: Tejun Heo <tj@...nel.org>
> Reported-by: Rabin Vincent <rabin.vincent@...s.com>
> Cc: Tomeu Vizoso <tomeu.vizoso@...il.com>
> Cc: stable@...r.kernel.org
> Tested-by: Jesper Nilsson <jesper.nilsson@...s.com>
> Tested-by: Rabin Vincent <rabin.vincent@...s.com>
> ---
>  include/linux/workqueue.h |  3 ++-
>  kernel/workqueue.c        | 37 +++++++++++++++++++++++++++++++++----
>  2 files changed, 35 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
> index 74db135..f597846 100644
> --- a/include/linux/workqueue.h
> +++ b/include/linux/workqueue.h
> @@ -70,7 +70,8 @@ enum {
>         /* data contains off-queue information when !WORK_STRUCT_PWQ */
>         WORK_OFFQ_FLAG_BASE     = WORK_STRUCT_COLOR_SHIFT,
>
> -       WORK_OFFQ_CANCELING     = (1 << WORK_OFFQ_FLAG_BASE),
> +       __WORK_OFFQ_CANCELING   = WORK_OFFQ_FLAG_BASE,
> +       WORK_OFFQ_CANCELING     = (1 << __WORK_OFFQ_CANCELING),
>
>         /*
>          * When a work item is off queue, its high bits point to the last
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index f288493..cfbae1d 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -2730,17 +2730,37 @@ EXPORT_SYMBOL_GPL(flush_work);
>
>  static bool __cancel_work_timer(struct work_struct *work, bool is_dwork)
>  {
> +       wait_queue_head_t *waitq = bit_waitqueue(&work->data,
> +                                                __WORK_OFFQ_CANCELING);
>         unsigned long flags;
>         int ret;
>
>         do {
>                 ret = try_to_grab_pending(work, is_dwork, &flags);
>                 /*
> -                * If someone else is canceling, wait for the same event it
> -                * would be waiting for before retrying.
> +                * If someone else is already canceling, wait for it to
> +                * finish.  flush_work() doesn't work for PREEMPT_NONE
> +                * because we may get scheduled between @work's completion
> +                * and the other canceling task resuming and clearing
> +                * CANCELING - flush_work() will return false immediately
> +                * as @work is no longer busy, try_to_grab_pending() will
> +                * return -ENOENT as @work is still being canceled and the
> +                * other canceling task won't be able to clear CANCELING as
> +                * we're hogging the CPU.
> +                *
> +                * Explicitly wait for completion using a bit waitqueue.
> +                * We can't use wait_on_bit() as the CANCELING bit may get
> +                * recycled to point to pwq if @work gets re-queued.
>                  */
> -               if (unlikely(ret == -ENOENT))
> -                       flush_work(work);
> +               if (unlikely(ret == -ENOENT)) {
> +                       DEFINE_WAIT_BIT(wait, &work->data,
> +                                       __WORK_OFFQ_CANCELING);
> +                       prepare_to_wait(waitq, &wait.wait,
> +                                       TASK_UNINTERRUPTIBLE);
> +                       if (work_is_canceling(work))
> +                               schedule();
> +                       finish_wait(waitq, &wait.wait);
> +               }
>         } while (unlikely(ret < 0));
>
>         /* tell other tasks trying to grab @work to back off */
> @@ -2749,6 +2769,15 @@ static bool __cancel_work_timer(struct work_struct *work, bool is_dwork)
>
>         flush_work(work);
>         clear_work_data(work);
> +
> +       /*
> +        * Paired with prepare_to_wait() above so that either
> +        * __wake_up_bit() sees busy waitq here or !work_is_canceling() is
> +        * visible there.
> +        */
> +       smp_mb();
> +       __wake_up_bit(waitq, &work->data, __WORK_OFFQ_CANCELING);
> +
>         return ret;
>  }
>
> --
> 2.1.0
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ