linux-kernel - Re: [PATCH] workqueue: cond_resched() after processing each work item

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKU+Ga_awbok9wTjbnrNbwqGcF0sagHpf729ZDxYdNVPMhc81A@mail.gmail.com>
Date:	Wed, 28 Aug 2013 16:11:56 -0700
From:	Jamie Liu <jamieliu@...gle.com>
To:	Tejun Heo <tj@...nel.org>
Cc:	Thomas Gleixner <tglx@...utronix.de>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] workqueue: cond_resched() after processing each work item

Hi Tejun,

On Wed, Aug 28, 2013 at 2:33 PM, Tejun Heo <tj@...nel.org> wrote:
> Would something like the following work?  Can you please verify it?

I confirm that this works.

> Thanks.
> ---- 8< ----
> If !PREEMPT, a kworker running work items back to back can hog CPU.
> This becomes dangerous when a self-requeueing work item which is
> waiting for something to happen races against stop_machine.  Such
> self-requeueing work item would requeue itself indefinitely hogging
> the kworker and CPU it's running on while stop_machine would wait for
> that CPU to enter stop_machine while preventing anything else from
> happening on all other CPUs.  The two would deadlock.
>
> Jmamie Liu reports that this deadlock scenario exists around

s/Jmamie/Jamie/

> scsi_requeue_run_queue() and libata port multiplier support, where one
> port may exclude command processing from other ports.  With the right
> timing, scsi_requeue_run_queue() can end up requeueing itself trying
> to execute an IO which is asked to be retried while another device has
> an exclusive access, which in turn can't make forward progress due to
> stop_machine.
>
> Fix it by invoking cond_resched() after executing each work item.
>
> Signed-off-by: Tejun Heo <tj@...nel.org>
> Reported-by: Jamie Liu <jamieliu@...gle.com>
> References: http://thread.gmane.org/gmane.linux.kernel/1552567
> Cc: stable@...r.kernel.org
> --
>  kernel/workqueue.c |    9 +++++++++
>  1 file changed, 9 insertions(+)
>
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index f02c4a4..73b662b 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -2201,6 +2201,15 @@ __acquires(&pool->lock)
>                 dump_stack();
>         }
>
> +       /*
> +        * The following prevents a kworker from hogging CPU on !PREEMPT
> +        * kernels, where a requeueing work item waiting for something to
> +        * happen could deadlock with stop_machine as such work item could
> +        * indefinitely requeue itself while all other CPUs are trapped in
> +        * stop_machine.
> +        */
> +       cond_resched();
> +
>         spin_lock_irq(&pool->lock);
>
>         /* clear cpu intensive status */

Thanks,
Jamie
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/